Howdy 馃憢

I’m Simon; an SRE by day, and quite often by night too. I’m fascinated by technology, cars and photography (although not necessarily in that order).

Using Pinniped for authentication against Talos-based Kubernetes clusters

May 3, 2025 路 10 min 路 Simon Weald

Postmortem: Management Cluster nodes flapping

A postmortem around sporadic flapping nodes in my ClusterAPI management cluster. ...

April 22, 2025 路 2 min 路 Simon Weald

Postmortem: Proxmox host drops offline twice

A short postmortem into why a node in my Proxmox cluster went offline twice within a few days. ...

April 4, 2025 路 2 min 路 Simon Weald

Replacing the drive in a single drive Proxmox node

Recently, I started getting the dreaded emails from smartd on one of my Proxmox nodes. The drive was (slowly) failing and needed to be replaced. Unfortunately due to my nodes being micro form-factor(MFF), they only have a single drive. This means that I would also need to migrate the VM disks to the new drive too. As this is a homelab setup, I don鈥檛 have a ton of capacity or any fancy storage solutions, so I would need to get a little creative and replace the drive in-situ. ...

August 17, 2024 路 7 min 路 Simon Weald

Update: Using BGP to integrate Cilium with OPNsense

A little while back, I wrote a short piece on integrating Cilium with OPNsense using BGP. With more recent releases of Cilium, the team have introduced the Cilium BGP Control Plane (currently as a beta feature). This reworking of the BGP integration replaces the old MetalLB-based control plane and as such the older feature must first be disabled. To enable the new feature, you can either pass an argument to Cilium: --enable-bgp-control-plane=true Or if you use Helm to install Cilium then the following values are required: ...

January 14, 2024 路 3 min 路 Simon Weald

Troubleshooting Network Traffic with CRI-O and Kubernetes

Running immutable infra is the holy grail for many people, however there are times when you鈥檒l need to get down in the weeds in order to troubleshoot issues. Let鈥檚 imagine a scenario; you need to verify that a pod is receiving traffic, but the image is built FROM scratch. As scratch containers are as minimal as possible, there鈥檚 no shell in the image, so there鈥檚 no way you can exec into it and hope to do anything remotely useful. ...

December 18, 2021 路 3 min 路 Simon Weald

Allowing DNS lookups with Hashicorp Consul + ACLs enabled

I鈥檝e recently been experimenting with Hashicorp鈥檚 Consul in my home infrastructure because I want to use it to provide service discovery and automatic DNS provisioning when I create Proxmox instances with Terraform. Consul is a bit of a hefty beast to get to grips with and getting DNS lookups working when you have ACLs enabled can be a little tricky - it鈥檚 taken me a day or two of going round in circles to figure this one out. ...

September 9, 2021 路 2 min 路 Simon Weald

Using BGP to integrate Cilium with OPNsense

Update: Cilium now has a new BGP integration so things have changed a little; see this post for more details. If (like me) you happen to follow the development of the Cilium CNI plugin for Kubernetes then you鈥檒l have seen the recent 1.10 release which included many shiny features. One exciting addition is the ability to announce Service IPs via BGP. Running Kubernetes in a homelab environment quickly highlights that there are some aspects which are a little lacking when compared to the integration you get from the cloud provider offerings. One of the biggest limitations is the inability to create Loadbalancer services to expose ingress controllers and the like. MetalLB has been around for some years now and its aim is to improve this situation by using either ARP or BGP to announce routes to Service IPs inside your cluster(s). This means that you can create Loadbalancer Services inside your on-prem (or homelab) network, and it goes a long way towards reducing the friction of running Kubernetes outside of the cloud providers. ...

May 31, 2021 路 5 min 路 Simon Weald

Securing SSH with the Vault SSH backend and GitHub authentication

This blog is going to be about using Hashicorp鈥檚 Vault to issue short-lived certificates to use with SSH. Most guides have you using a username & password to authenticate with Vault, but I鈥檝e chosen to delegate that to GitHub instead. I鈥檓 assuming you already have a Vault server running - I won鈥檛 be covering that in the course of this blog. You鈥檒l also need a sufficiently-privileged Vault token, and jq installed on the machine you wish to SSH from. ...

May 30, 2020 路 7 min 路 Simon Weald

Thanos and Prometheus without Kubernetes

Running Thanos without Kubernetes If you鈥檝e been around the cloud-native world for a while, you鈥檒l no doubt be familiar with (and quite likely already be using) Prometheus. You may however not have heard of Thanos. Put simply, Thanos takes Prometheus and makes it even more awesome. In their own words, the high-level description of Thanos is the following: Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments. ...

March 11, 2019 路 6 min 路 Simon Weald