Homelab Webhooks¶

In this blog, we'll explore how I handle webhooks in my home lab.

My home lab is not accessible from the internet, so I need a way to receive webhooks from external services.

This will be a multi-part blog where we'll explore the following topics:

use case, requirements, and design (this blog)
code implementation, using Go lang (this blog)
server deployment using Docker Compose, AWS, and Hashicorp tools (future blog)
client deployment using Kubernetes (future blog)
monitoring and tracing using Prometheus, Grafana, and OpenTelemetry (future blog)

Let's get started!

Use Case¶

I want to receive webhooks from external services in my home lab for practice and practicality.

For example, I want to build and test some hobby projects in my home lab with Jenkins or Tekton. I am a big fan of Jenkins (I used to work on it at CloudBees) and its creator, Kohsuke Kawaguchi.

He once wrote a blog post I keep referring to Polling must die. Besides the evils of polling itself, I want to ensure I build every commit separately so it is easier to compare results and find the root cause of issues. So yeah, webhooks are the way to go.

I grew up with the internet of the 90's. You could create your own website, message board, and so on.

And what eventually happened was that everything that wasn't locked down was abused. If you put an outdated machine on the internet, it wouldn't be long before it was compromised. So, I am a bit paranoid about exposing my home lab to the internet.

In conclusion, I want to receive webhooks from external services without exposing my home lab to the internet.

Requirements¶

I have worked with many companies as a consultant in the past, assisting them in setting up their CI/CD pipelines. Time and time again, the issue of webhooks comes up. Often, the environment is locked down, and we need to jump through hoops to get webhooks to the right place.

Some other things that came up were security concerns about who could send a webhook, how to confirm it is from a trusted source, and how to ensure the webhook is not tampered with.

Related to that, companies want a simple way of collecting and forwarding webhooks, but they want to filter the ones that get processed. I agree with most of these concerns and found none of the free Webhook Relay services fit the requirements.

And, to be fair, I have always wanted to build my own webhook relay service. So, I probably didn't look too hard for a solution that would fit the requirements.

So, the requirements are:

receive webhooks from external services, like GitHub (1,2,3 many, so GitHub first)
able to collect webhooks and sync them to a local service via a trusted intermediary
have a way to confirm the webhook is from a trusted source
have a way to ensure the webhook is not tampered with
have a way to filter webhooks that get processed

Design¶

So, how do we design a system that can handle these requirements? I had taken a look at some of the existing solutions for Webhook Relays and came up with the following design:

Server that can receive webhooks from GitHub
Client (or Relay) that can pull the webhooks from the webserver
Relay then relays the webhooks to a local service, like Jenkins

So below is a diagram with an overview of the design.

graph LR
 GitHub[GitHub] -->|Webhook Push| Webserver[Webserver]
 Webserver -->|Webhook Pull| Relay[Relay]
 Relay -->|Webhook Push| Jenkins[Jenkins]

 subgraph Homelab Environment
 Relay
 Jenkins
 end

Another design decision is that I want the Relays to work for a single Repository. While this is cumbersome for those with many repositories, it keeps the client simpler and bulkheads possible failures.

This also means the Relay application needs to be compact and resource-efficient.

Build the Relay¶

Next, we'll implement the Webhook Server and Relay code.

We'll take a high overview first: * learning goals * supported functionality * application design

Before diving into the Server and Relay implementation

Learning Goals¶

This is a hobby project for learning and fun, which I give priority over the best solution. There are some technologies I want to spend more time with:

Go lang
GRPC
OpenTelemetry, tracing across multiple applications explicitly
implement mTLS rather than leave it to a ServiceMesh
for anything that gets hosted in a public space, I want to use AWS with Terraform

These are the most important things on my list of things to learn or spend some more time with.

Supported Functionality¶

Aside from learning new things and having fun, I want several capabilities from the solution to justify building something new.

As mentioned in the beginning, I want more control over the webhooks. In addition, I have some criteria related to how the application is deployed.

have the option to store data in a datastore of some kind, but not required
support the HMAC token from GitHub and validate it
support authentication between Relay and Server
support TLS (with custom CA) and non-TLS to aid testing and debugging
can run the application locally, in a container, and in Kubernetes
in Kubernetes, we should support Helm and Carvel packages as options
the Relay does health checks on the Relay Target, i.e., the endpoint we relay the webhook to
for the Relay Target, we should support TLS with custom CA and non-TLS

Application Design¶

As we're using Go, many required capabilities can be implemented via existing libraries and frameworks.

These libraries, in turn, impact the structure of the application.

Webserver: Echo
Observability: OpenTelemetry Go SDK
Logging: ZeroLog
GRPC: official GRPC Go library
Datastore: Redis
Startup Flags: built-in flag package

While doing some experiments with the Server, I realized it can accept webhooks and relay them again, creating a Relay chain, as you will.

I hear you, "of course it can". I did not want to make it too recursive, so I implemented most of the Relay functionality as a separate binary. With Go, this is straightforward; you create a package structure where you have two Mains, providing two entry points to the same base application.

This might not be the best solution, but it works well enough for me. The application (folder) structure looks like this:

.
├── api
├── cmd
│   ├── client
│   └── Server
├── docs
├── go.mod
├── go.sum
└── internal

api: the folders where we put the Proto files for GRPC
cmd/client: the Relay entry point
cmd/server: the Server entry point
docs: you always need some documentation, and as I expect to drop in and out of this project, I need to store what I learn as I will forget it
internal: the packages that contain all the non-reusable logic

Webhook Server Highlights¶

The eh, not-so-pretty code is available on GitHub. I won't go into the details here, but I will highlight some more interesting parts.

The critical parts of the Server are:

Webhook Handling
GRPC Server
GRPC Health Check
GRPC Authentication
GRPC TLS
GRPC Stream Processing
OpenTelemetry Tracing, which we will cover in another blog

Webhook Handling¶

Many Git servers can send webhooks. This is my hobby project, and I use GitHub for all my hobby projects. So, I limit the implementation to GitHub.

GitHub sends a webhook with a JSON payload and metadata in the headers. So, we need an HTTP server that can receive a POST request, process the headers, and parse the JSON payload. Then, filter the Webhook and validate the HMAC token if one is configured.

We then store the Webhook in a data store. The data store is abstracted, with the default being an in-memory store. Alternatively, we can use Redis, but it increases the complexity of the setup, so it's optional.

As stated, I use Echo for the HTTP server. And the built-in flag package for the startup flags. Using the flags to configure the Server, we can set the port, the data store (Redis), etc.

To handle the GitHub Webhook, we initialize the Echo server and add a POST route for the Webhook: /v1/github.

We process three headers from HTTP request:

X-Github-Hook-Installation-Target-Id: contains the repository ID where the Webhook is coming from, which we use to filter for only those we are interested in
X-Github-Hook-Installation-Target-Type: contains the Type of resource the event originates from, we only care about repositories
X-Hub-Signature-256: the HMAC token, if set in the Server, we validate it

If the Webhook passes the filters, we store it in the data store. Where a Relay client can retrieve it from.

Why Webserver and GRPC Server

GitHub can only send Webhooks via HTTP.

So even though we use GRPC for the Relay, we need an HTTP server to receive the Webhooks. I want to learn how to use GRPC with Go, so I decided to use GRPC for the Relay to Server communication.

This makes it a bit awkward because now the Server is accessible on two ports. But it is a hobby project, so I can live with that.

GRPC Server¶

The first thing we need to do is define the GRPC service in a Proto file. This is the API definition, and we generate the Go code from it.

We have two main methods, one to fetch Webhook Events and one to push a Webhook Event. The primary push method is the HTTP server; in the event of chaining Servers, we also have a GRPC method.

service Gitstafette {
  rpc FetchWebhookEvents (WebhookEventsRequest) returns (stream WebhookEventsResponse){}
  rpc WebhookEventPush (WebhookEventPushRequest) returns (WebhookEventPushResponse) {}
}

Then we can generate the Go code using the protoc command.

.PHONY: compile
compile:
 protoc api/v1/*.proto \
 --go_out=. \
 --go-grpc_out=. \
 --go_opt=paths=source_relative \
 --go-grpc_opt=paths=source_relative \
 --proto_path=.

Implementing the GRPC server using the defacto standard GRPC Go library is straightforward. So, let's look at the more unique parts.

GRPC Health Check¶

HTTP is a relatively simple protocol to work with; it is straightforward to check if a server is up and running. The responses are also in plaintext, so you can test them with a plethora of standard tools, such as curl, httpie, or Postman, or even a browser.

GRPC is more complex, as it is a binary protocol. One way to simplify understanding the health of a GRPC server is to use a health check service.

We implement this as a separate GRPC service, which is a typical pattern. We can then either host it on the same GRPC server or a separate one—more on this in the inset.

There is a recommended implementation, which I've used in the Server. No need to reinvent the wheel.

Health Check Service As Separate Server

So why would we run it as a separate GRPC Server?

One reason is that when we run in the Kubernetes application or other container orchestrators, we can use it as one of the health probes.

We want to expose the main service to the outside world with TLS, which would likely be a self-signed certificate. Usually, the health check mechanisms do not support self-signed certificates. So, being able to run the Health Check with a separate Server allows us to enable TLS on the main Server and not on the Health Check Server.

GRPC Authentication¶

Aside from using certificates with TLS (described next), we can also use authentication.

We can use the built-in GRPC authentication based on the OAUTH Token-based authentication.

To handle the OAuth Token-based authentication, you process the GRPC metadata. I used GRPC Interceptors to authenticate every request.

Thinking about how I want to run the Server, the OAuth token is not set by flag, but by environment variable. This way, in Kubernetes, we can inject an environment variable from a Secret. It may not be as safe as using an injected file, though it is good enough for my hobby project.

Here's an abbreviated example of the GRPC Interceptor for Token Validation:

GRPC Interceptor for Token Validation

grpc_interceptor.go

func ValidateToken(srv interface{}, ss grpc.ServerStream, info *grpc.StreamServerInfo, handler grpc.StreamHandler) error {
  oauthToken, oauthOk := os.LookupEnv(envOauthToken)
  if oauthOk {
    md, ok := metadata.FromIncomingContext(ss.Context())
    if !ok { } // handle error
    if !valid(md["authorization"], oauthToken) {
      return status.Errorf(codes.Unauthenticated, "invalid token")
    }
  }
}

func valid(authorization []string, expectedToken string) bool {
  if len(authorization) < 1 {
    return false
  }
  receivedToken := strings.TrimPrefix(authorization[0], "Bearer ")
  return receivedToken == expectedToken
}

And this is how you add the Interceptor to the GRPC Server:

Add Interceptor to GRPC Server

cmd/server/main.go

s := grpc.NewServer(
  grpc.StreamInterceptor(grpc_middleware.ChainStreamServer(
    grpc_auth.StreamServerInterceptor(ValidateToken),
  )),
)

GRPC TLS¶

There are different scenarios to consider when supporting TLS with the GRPC Server.

The Server responds to the outside world directly and needs to support TLS itself. The Server can be behind a Load Balancer, Reverse Proxy, or an API Gateway, which handles the TLS for us.

Next, we must consider the Relay client, which connects to the Server.

So we get the following options:

Server with TLS, Relay with TLS
Server with TLS, Relay without TLS
Server without TLS, Relay with TLS
Server without TLS, Relay without TLS

The easiest way is to support TLS and non-TLS for both the Server and Relay connections.

In addition, we also have to consider the following:

Self-signed certificates: if the Client uses a self-signed certificate, the Server has to trust it.
Public CA certificates: if the Client uses a certificate from a public CA, the Server has to trust it.

Go lang doesn't trust public CA certificates by default. So, we have to load them from the system or provide them in the code.

// we configure the TLS Config
tlsConfig := &tls.Config{}

// we load the system CA certificates
ca, err := x509.SystemCertPool()
if err != nil {} // handle error


// if we use a self-signed certificate, we load them from a file
ok := ca.AppendCertsFromPEM([]byte(b))
f !ok {} // handle error

// we set the RootCAs
tlsConfig.ClientCAs = ca

// we set the ClientAuth, if we want to require and verify the client certificate
tlsConfig.ClientAuth = tls.RequireAndVerifyClientCert

GRPC Stream Processing¶

Just a refresher, the Protobuf RPC function for retrieving Webhook Events is a stream.

service Gitstafette {
  rpc FetchWebhookEvents (WebhookEventsRequest) returns (stream WebhookEventsResponse){}
}

The consequence of this, is that we keep a connection open between the client and the Server.

Timeout Issues on Cloud Run

Initially I deployed the Server on GCP Cloud Run.

I will dive into the details when we get to the Server deployment blog, but I ran into timeout issues.

These issues were related to using Knative, which in turn uses Envoy. Envoy has some defaults, including a timeout of about 5 minutes.

The connection is closed if the Server does not respond within that time. This is why the Server I build has a configurable Streaming Time.

The code below is abbreviated and shows the main loop of the FetchWebhookEvents method.

There are better ways to handle this, but this is what I came up with and it works well enough for me. As described, we stream for a particular duration and check if the context is done.

To pace the client, we have a ResponseInterval. When a request comes in, and we have events to send, we send them and then wait for the ResponseInterval before sending the next batch.

For each event, i.e., the webhook, we keep track of whether or not it has been sent to a Relay (client). We update the status once we confirm the stream did not have an error.

This prevents us from sending the same event multiple times. In another process, we regularly check for any old statuses that have been sent so we can remove them. This way, we avoid sending outdated webhook events and reduce the memory footprint of the Server.

Server Fetch Webhook Events

internal/server/gsf_server.go

func (s GitstafetteServer) FetchWebhookEvents(request *api.WebhookEventsRequest, srv api.Gitstafette_FetchWebhookEventsServer) error {
  durationSeconds := request.GetDurationSecs()
  finish := time.Now().Add(time.Second * time.Duration(durationSeconds))
  ctx, stop := signal.NotifyContext(srv.Context(), os.Interrupt, syscall.SIGTERM)
  defer stop()

  for time.Now().Before(finish) {
    closed := false
    select {
    case <-time.After(s.ResponseInterval):
      events, err := retrieveCachedEventsForRepository(request.RepositoryId)
      if err != nil {} // handle error
      response := &api.WebhookEventsResponse{
        WebhookEvents: events,
      }

      if err := srv.Send(response); err != nil {} // handle error
      updateRelayStatus(events, request.RepositoryId)
    case <-srv.Context().Done():
      closed = true
      break
    case <-ctx.Done(): 
      closed = true
      break
    }
    if closed {
      break
    }
  }
}

Not a reliable system

While this setup works well for me, it is not a reliable system.

The Server doesn't keep track of which client received the events. So if a client disconnects or crashes, it cannot re-receive the events.

There are still plenty of improvements to make to ensure the system is reliable. It is not a production-ready system, but it is good enough for my homelab and has been a great learning experience.

Server Diagram¶

To recap, the diagram below shows the main flows of the Server.

flowchart TB

 A[GitHub] --post webhook--> HTTPServer
 subgraph server [Server]
 HTTPServer --store-->SC[(Webhook Cache)]
 FetchServer -. retrieve .-> SC
 CC{{CacheCleanup}}-- remove stale -->SC
 end

 subgraph relay [Relay]
 FetchRelay --fetch webhooks--> FetchServer
 end

Relay Highlights¶

The Relay lives in the same codebase as the Server, with its main a different package.

Initially, I named this the client, but I realized that it is more of a Relay as it connects to both an upstream Server and a downstream Server (Relay Target).

The Relay has the following capabilities:

GRPC Client, connecting to the Server to fetch Webhook Events
HTTP Server for debug inspection and health checks (especially useful in Kubernetes)
HTTP Client, connecting to the Relay Target to push Webhook Events (incl. heartbeat checks)
Cache to store Webhook Events in memory until they are successfully pushed to the Relay Target (which can be offline or otherwise unavailable)

The only interesting thing to highlight is the GRPC Client and the difference in TLS configuration between the Server and the Relay. The rest of the code is pretty standard, and I will not go into the details here.

GRPC Client¶

The connection to the Server is handled in two steps.

First, we have a loop that keeps reconnecting to the Server unless we encounter an unrecoverable error.

Remember the timeout issues I had on GCP Cloud Run? Well, that means we cannot be certain the connection will stay open for as long as we expect. So we have to handle cases where the connection is closed, but there isn't any inherent problem.

GRPC Client Loop

This code is abbreviated, but it shows the main loop of the Relay.

cmd/client/main.go

for {
  err := handleWebhookEventStream(grpcServerConfig, grpcClientConfig, ctx)
  if err != nil {} // handle error
  if ctx.Err() != nil { // if error, break out of the loop and handle closing the client
    break
  }
  if ctx.Done() != nil && ctx.Err() == nil { // if no error, but Context is Done, refresh context
    ctx, stop = signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
  }
  sleepTime := time.Second * 3
  time.Sleep(sleepTime)
}

Then, we have the actual method that handles the GRPC connection. This is where we connect to the Server, and then we loop over the stream of Webhook Events.

We validate each Webhook Event and then store it in the cache. A separate process checks the cache and pushes the Webhook Events to the Relay Target.

Client Fetch Webhook Events

Some things have been omitted, but this is the Fetch Webhook loop of the Relay.

cmd/client/main.go

func handleWebhookEventStream(serverConfig *api.GRPCServerConfig, clientConfig *api.GRPCClientConfig, mainCtx context.Context) error {
  grpcOpts := createGrpcOptions(serverConfig)
  address := serverConfig.Host + ":" + serverConfig.Port
  conn, err := grpc.Dial(address, grpcOpts...)
  if err != nil {} // handle error
  defer conn.Close()

  connectionCtx := mainCtx
  client := api.NewGitstafetteClient(conn)
  request := &api.WebhookEventsRequest{
    ClientId:            clientConfig.ClientID,
    RepositoryId:        clientConfig.RepositoryId,
    LastReceivedEventId: 0,
    DurationSecs:        uint32(serverConfig.StreamWindow),
  }

  stream, err := client.FetchWebhookEvents(connectionCtx, request)
  if err != nil {} // handle error

  finish := time.Now().Add(time.Second * time.Duration(clientConfig.StreamWindow))
  contextClosed := false

  for time.Now().Before(finish) {
    select {
    case <-time.After(requestInterval):
      if contextClosed {
        break
      }

      response, err := stream.Recv()
      if err == io.EOF {
        contextClosed = true
        break
      }
      if err != nil {
        contextClosed = true
        return err
      }
      if len(response.WebhookEvents) > 0 {
        for _, event := range response.WebhookEvents {

          eventIsValid := v1.ValidateEvent(clientConfig.WebhookHMAC, event)
          if !eventIsValid {
            continue
          }
          cache.Event(clientConfig.RepositoryId, event)
        }

      }
    case <-stream.Context().Done():
      contextClosed = true
      break
    case <-mainCtx.Done(): 
      contextClosed = true
      break
    }
    if contextClosed {
      break
    }
  }

  if stream.Context().Err() != nil {} // handle error
}

GRPC TLS¶

The Relay has to support TLS and non-TLS connections to the Server and the Relay Target. All of these options are configurable via the Flags.

The Relay uses the same function for setting up the GRPC TLS Config as the Server. The main difference is that the Relay doesn't set Client Auth and sets the Root CAs to the Server's Root CAs.

tlsConfig.RootCAs = ca

Relay Diagram¶

To recap the Relay, the main flows of the Relay are below.

flowchart TB
 subgraph server [Server]
 FetchServer
 end

 subgraph relay [Relay]
 FetchRelay --fetch webhooks--> FetchServer
 FetchRelay --store-->SC[(Webhook Cache)]
 RS{{RelaySend}}-. retrieve.->SC
 end

 subgraph relaytarget [Relay Target]
 RS --post webhook--> W(Webhook Listener)
 end