Proxy Dockerhub access with K3S¶
The Problem¶
When you are running a Kubernetes cluster, you will often deploy images from Dockerhub.
This can be slow, as the images have to be downloaded from the internet.
This can be a problem if you have a slow internet connection, or if you have to deploy many images at once.
You also waste bandwidth, as the images are downloaded multiple times.
In addition, you might want to be able to scan the images for security vulnerabilities before deploying them. Or you need to authenticate to download the images, as many images run into rate limiting issues otherwise.
The Solution¶
One way to solve this problem is to use a proxy.
This proxy can be used to cache images, so that they do not have to be downloaded from Dockerhub every time.
This is also known as a Pull Through Cache1 2, which is a common solution to this problem.
This can speed up the deployment process, as the images are already available on the local network. And prevents being rate limited, as the images are only downloaded from Dockerhub Once. An additional benefit, is that the hosts do not need to have direct access to Dockerhub, and do not need to the credentials, limiting the exposure of the credentials.
The Implementation Plan¶
So, how do you set this up?
We start with setting up the proxy server. If you are in a public cloud environment, it is best to use the cloud provider's registry service, as it is already optimized for this use case.
In AWS, you can use ECR as a pull through cache. The folks of AWS have documented how to do this in their documentation2.
In a private cloud environment, you have to run a Container Image registry yourself. There are various options, such as Docker Registry 23, Sonatype Nexus4, JFrog Artifactory5, or Harbor6.
In this example, we use Harbor. Harbor is a free and open source container image registry, is well maintained, and has a lot of features. It is also the defacto standard for the Kubernetes distribution from VMware Tanzu7(by Broadcom).
Warning
Full disclaimer, I used to work for VMware and assisted various companies with setting up Harbor.
Once you have a registry set up, you can configure it to act as a pull through cache. You create a Proxy repository with authentication to Docker Hub, and configure the registry to use this repository as a cache.
Assuming that you run the registry yourself, you need to supply the registry with a certificate so it can be trusted. The Kubernetes nodes need to trust this certificate as well, so that they can pull images from the registry.
In my homelab, I'm using K3S8 as my Kubernetes distribution. So, I will show you how to configure K3S to use the registry as a pull through cache.
Setup a Proxy Server - Harbor¶
There are many ways to set up a registry, as mentioned before we use Harbor in this example.
We can install Harbor in various ways, such as using Helm, Docker Compose, or the installer script.
In my Homelab, I run two Kubernetes cluster, each consisting of an Asus MiniPC running the Control Plane and several Raspberry Pi 4's running the Worker Nodes.
The MiniPCs use Docker Compose to run several services, such as Keycloak for authentication, outside of the clusters. This way, these services are always available even before the clusters are up and running - or in case of a cluster failure.
I use the same approach for Harbor, running it on one of the MiniPCs using Docker Compose.
Install Harbor Via Docker Compose¶
Harbor has default support for Docker Compose, with an installer9 that generates the necessary files.
The installation process, described here10, is quite straightforward.
- Download the latest release
- Configure SSL
- Configure Harbor
- Run the Prepare Script
- Run Docker Compose
Download and unpack the installer¶
You can download the online installer from the GitHub releases page11.
In my case, the latest release is v2.11.1
.
We download the installer and unpack it:
Configure SSL¶
In my case, I run more than one service on the MiniPC, so I use Envoy as a reverse proxy. So the Envoy proxy will handle the SSL termination, and the Harbor service will run on HTTP.
If you want to run Harbor on HTTPS, you can configure the harbor.yml
file accordingly.
Configure Harbor¶
The default settings are included, so we only need to change the hostname and the admin password.
As we run Harbor on HTTP, we comment out the HTTPS section.
We copy the harbor.yml.tmpl
to harbor.yml
and edit the file.
We change the following settings:
hostname: my.harbor.com
harbor_admin_password: admin
- comment out the
https
section# https:
# port: 443
# certificate: /your/certificate/path/fullchain.pem
# private_key: /your/private/key/path/privkey.pem
Run the Prepare Script¶
The Prepare script runs a container that generates the necessary configuration files for Docker Compose.
The script validates the configuration and generates the necessary files. The output should look like this:
prepare base dir is set to /home/joostvdg/projects/test/harbor-install/harbor
Unable to find image 'goharbor/prepare:v2.11.1' locally
v2.11.1: Pulling from goharbor/prepare
Digest: sha256:35dbf7b4293e901e359dbf065ed91d9e4a0de371898da91a3b92c3594030a88c
Status: Downloaded newer image for goharbor/prepare:v2.11.1
prepare base dir is set to /home/joostvdg/projects/test/harbor-install/harbor
WARNING:root:WARNING: HTTP protocol is insecure. Harbor will deprecate http protocol in the future. Please make sure to upgrade to https
Generated configuration file: /config/portal/nginx.conf
Generated configuration file: /config/log/logrotate.conf
Generated configuration file: /config/log/rsyslog_docker.conf
Generated configuration file: /config/nginx/nginx.conf
Generated configuration file: /config/core/env
Generated configuration file: /config/core/app.conf
Generated configuration file: /config/registry/config.yml
Generated configuration file: /config/registryctl/env
Generated configuration file: /config/registryctl/config.yml
Generated configuration file: /config/db/env
Generated configuration file: /config/jobservice/env
Generated configuration file: /config/jobservice/config.yml
Generated and saved secret to file: /data/secret/keys/secretkey
Successfully called func: create_root_cert
Generated configuration file: /compose_location/docker-compose.yml
Clean up the input dir
In my case, I ran into some issues with permissions and the log files.
So I removed all the log configuration (not recommended), and changed the permissions of the common
directory.
By default, the proxy cache is disabled.
We need to enable it by setting the PERMITTED_REGISTRY_TYPES_FOR_PROXY_CACHE
environment variable.
This is done for the core
service in the docker-compose.yml
file.
For each registry type, we need to add a comma-separated value. In order to be flexible, we add all the supported registry types.
services:
core:
environment:
- PERMITTED_REGISTRY_TYPES_FOR_PROXY_CACHE=docker-hub,harbor,azure-acr,aws-ecr,google-gcr,quay,docker-registry,github-ghcr,jfrog-artifactory
Once we have made the changes, we can run Docker Compose.
Envoy Reverse Proxy¶
As described, I use Envoy as a reverse proxy fronting the services on the MiniPC.
You can use many other reverse proxies, such as Nginx, Traefik, or HAProxy. In my case, I was also working with Knative Serving (which uses Envoy) for work, so I wanted to get more experience with Envoy.
For Envoy to do its job, we need to configure the following:
- Docker Compose Service entry
- Envoy Configuration
- Certificates
Docker Compose Service entry¶
To run Envoy, we need to add a service entry to the docker-compose.yml
file.
In this, we configure the image, the restart policy, the configuration file, the command, the ports, the resources, and the secrets.
The secrets are the certificates that Envoy needs to trust the Harbor service.
As my MiniPC has limited resources, I limit the CPU and memory usage of the Envoy service.
Docker Compose Service entry
services:
envoy:
image: envoyproxy/envoy:v1.27.0
restart: unless-stopped
configs:
- source: envoy_proxy
target: /etc/envoy/envoy-proxy.yaml
uid: "103"
gid: "103"
mode: 0440
command: /usr/local/bin/envoy -c /etc/envoy/envoy-proxy.yaml -l debug
ports:
- 443:443
- 80:80
- 8082:8082
deploy:
resources:
limits:
cpus: '0.50'
memory: 128M
reservations:
cpus: '0.25'
memory: 64M
secrets:
- harbor-cert
- harbor-cert-key
- ca
configs:
envoy_proxy:
file: ./envoy/envoy.yaml
secrets:
harbor-cert:
file: ./certs/harbor.pem
harbor-cert-key:
file: ./certs/harbor-key.pem
ca:
file: ./certs/ca.pem
Envoy Configuration¶
The config file mentioned in the Docker Compose service entry is the Envoy configuration file.
In here, we configure the proxy, the listener, the filters, the clusters, and the certificates.
The certificates are the secrets that we added to the Docker Compose service entry.
Which by default are mounted in the /run/secrets/
directory.
Envoy Listener¶
Here we configure the deault listener on port 443.
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 443
listener_filters:
- name: "envoy.filters.listener.tls_inspector"
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
Envoy Filter¶
Here we configure the filter for the Harbor service.
This tells the listener to route all traffic to the Harbor Cluster when the hostname is harbor.home.lab
.
At the same time, we configure the Transport Socket to use the certificates that we mounted as secrets.
filter_chains:
- filter_chain_match:
server_names:
- harbor.home.lab
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: AUTO
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: app
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: harbor
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /run/secrets/harbor-cert
private_key:
filename: /run/secrets/harbor-cert-key
Envoy Cluster¶
Here we configure the cluster for the Harbor service.
The cluster is the backend service that the listener routes the traffic to. By default Envoy assumes a backend service has more than one running instance, so we configure the Load Balancing Policy.
We only have one service, but Harbor can run in active/active mode if you want to scale it out. So it is possible to have multiple instances of Harbor running, and Envoy will load balance the traffic between them.
clusters:
- name: harbor
connect_timeout: 60s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: harbor
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: harbor-nginx
port_value: 8080
Full Example¶
Here's the full example of the Envoy configuration file.
Full Envoy Example
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 443
listener_filters:
- name: "envoy.filters.listener.tls_inspector"
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
filter_chains:
- filter_chain_match:
server_names:
- harbor.home.lab
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: AUTO
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: app
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: harbor
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /run/secrets/harbor-cert
private_key:
filename: /run/secrets/harbor-cert-key
- name: harbor
connect_timeout: 60s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: harbor
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: harbor-nginx
port_value: 8080
Envoy Certificates¶
For creating the Certificate Authority (CA) and the certificates for the Harbor service, we use the cfssl
tool.
I have written a guide on how to use here12, we so we will not go into detail here.
Configure Proxy Cache in Harbor¶
Now that we have Harbor running and accessible via the Envoy proxy, we can configure the proxy cache15.
For this to work, we need to:
- Register a new Registry Endpoint
- Create a new Project as a Proxy Cache
Once of the key aspects of the proxy cache is that it needs to be able to authenticate to Docker Hub13. So make sure you have an account at Docker Hub, and generate an access token14.
In Harbor, login as the admin user, and go to the Administration
section.
In there, select Registries
menu, and select New Endpoint
.
In the New Endpoint
form, select Docker Hub
as the Provider, and fill in the Name and Description.
Fill in the Access ID (your Docker Hub username) and the Access Secret (the access token you generated).
Hit Test Connection
to verify the connection, and then Save
.
Now, go to the Projects
section, and select New Project
.
Fill in the Name and Description, and select the Public checkbox.
Then flip the Proxy Cache switch to On
, and select the Docker Hub endpoint you just created.
Hit OK, and the project is created. This project will now act as a proxy cache for Docker Hub.
Warning
The default namespace for Dockerhub is library
, so you need to use this namespace when pulling images from Dockerhub that have no prefix, e.g. nginx
.
If your Harbor is accessible via harbor.home.lab
, you can pull images from Dockerhub using harbor.home.lab/library/nginx
.
Don't worry, we can configure K3S in a way that we don't need to handle this manually in Kubernetes.
Configure K3S¶
K3S is a lightweight Kubernetes distribution from Suse (formerly Rancher). It is optimized for edge and IoT use cases, but is also great for homelabs.
Because it is designed for edge use cases, it is very flexible and easy to configure. Especially for cases where it cannot reach the internet, or where you want to use a private registry16.
We can use this capability to configure the Harbor registry as a pull through cache.
K3S Registries Configuration¶
As per the docs16:
Containerd can be configured to connect to private registries and use them to pull images as needed by the kubelet. Upon startup, K3s will check to see if /etc/rancher/k3s/registries.yaml exists. If so, the registry configuration contained in this file is used when generating the containerd configuration.
We can create a registries.yaml
file and place it in the /etc/rancher/k3s/
directory.
We first define the mirrors
section, which is a map of registry names to their endpoints, and possibly a rewrite rule.
We need to define a rewrite rule, as the images in the Harbor proxy cache are prefixed with the (Harbor) project name.
Then we define the configs
section, which is a map of registry names to their configuration.
In this case, we need to define the CA certificate that the nodes (i.e., ContainerD) need to trust.
Registries.yaml
Configure K3S with Ansible¶
In my homelab, I use Ansible to configure the MiniPC's (control plane) and Raspberry Pi's that run the K3S worker nodes. Below is an example of how to configure the K3S Control Plane nodes to use the Harbor registry as a pull through cache.
Ansible Playbook
---
- hosts: "controlplane"
become: true
tasks:
- name: Create directory k3s in /etc/rancher/
file:
path: /etc/rancher/k3s
state: directory
- name: Copy registries.yaml to /etc/rancher/k3s/
ansible.builtin.copy:
src: registries.yaml
dest: /etc/rancher/k3s/registries.yaml
owner: root
group: root
mode: 755
- name: Restart K3S Server
shell: /bin/bash -c "sudo systemctl restart k3s"
- name: Create directory mycert in /usr/local/share/ca-certificates/
file:
path: /usr/local/share/ca-certificates/kearos
state: directory
- name: Copy certificate to /usr/local/share/ca-certificates/kearos
copy:
src: ca.crt
dest: /usr/local/share/ca-certificates/kearos/ca.crt
owner: root
group: root
mode: 0644
- name: Update CA certificate trust
shell: /bin/bash -c "update-ca-certificates"
For the worker nodes, you can use a similar playbook, but you we need to change the restart command to restart the K3S agent service.
Conclusion¶
In this post, we have shown how to set up a proxy cache for Dockerhub using Harbor and K3S.
This can speed up the deployment process, as the images are already available on the local network.
It also prevents being rate limited, as the images are only downloaded from Dockerhub once and using authentication. It is also more secure, as the hosts do not need to have direct access to Dockerhub, and do not need to the credentials.
References¶
-
AWS - Sync an upstream registry with an Amazon ECR private registry ↩↩
-
VMware Tanzu - VMware's (by Broadcom) Cloud Native label, including a Kubernetes distribution ↩
-
K3S - Light Weight Kubernetes distribution from Suse(formerly Rancher) ↩
-
Joost van der Griendt - Setup a Certificate Authority using CloudFlare's CFSSL ↩