Tuesday, 16 July 2019

Setting up MAAS and Kubernetes in a virtualised environment

Setting up MAAS with Kubernetes and persistent storage in a Hyper-V environment, with an existing network and DHCP server. Unfortunately there is limited documentation on running MAAS in an existing network with a DHCP server. Futhermore, there's little to no mention of Hyper-V support.

While it's not the recommended environment, recently I thought to try spin up a Kubernetes clusted in a network, and it seemed like MAAS is now the recommended way to deploy it. Since the existing network was running a Hyper-V cluster, I decided to see how hard would it be to spin up MAAS on top of Hyper-V machines. After experimenting, and several full wipes and clean starts, I ended up with a redundant Kubernetes cluster and distributed storage nodes using Ceph. I decided to outline the process to install and configure it, as well as some things I learned in the process

Installing MAAS

Rather straightforward installation - 4 vCPUs and 4 GB ram should be more than enough for a controller to manage simple network (https://docs.maas.io/2.6/en/intro-requirements)
It will need storage for virtual images and logs. In my case I went for a 250gb expanding disk, but a 40GB default image should have been more than enough too
You'll need to configure secure boot to use the "Microsoft UEFI Certificate Authority" to allow booting a Linux image

This will be the only VM we install manually.

I used the Ubuntu Server 18.04 image. You can either do a clean Ubuntu install and then configure it as a MAAS server, or use have the installer configure the MAAS region server. I went for the latter option.
You NEED to set a static IP for this server, to avoid issues going forward.

You will also need an SSH key for authentication. Go ahead and set it up on your Launchpad or GitHub account, to allow the installer to download it.
This key will be used to log in to the MAAS server over SSH

The last step in the installer lets us install additional snaps. We can have it install juju here, but you can easily install it later.

Once installed and restarted, you can SSH into the server using your password or key, and install the latest updates to the system:

sudo apt update
sudo apt upgrade

Finally, you can log in to the web interface by navigating to http://10.0.0.100:5240/MAAS substituting your allocated IP address. Log in and finish the basic configuration and import your SSH keys again (these will be used to allow you to log in to machines deployed in MAAS)

Finally you'll see the MAAS interface with two warnings. One telling you that there are no boot images, and another one telling you that the DHCP server is not enabled.

  • The first one will disappear shortly after the boot images are downloaded. If it doesn't, you can navigate to the images page in the top navigation menu, and click on "Update selection". This will trigger an image download, if they haven't already.
  • The second warning appears since our network already has a DHCP server, so MAAS disabled the built-in server. This is the expected behaviour since we want to continue using our DHCP server, so you can dismiss it.

Generally the documentation is rather lacking regarding running MAAS with your own DHCP server, but there are several notes that you can find. The two most important ones in my opinion are:

External DHCP and a reserved IP range

If an external DHCP server will be used to deploy machines then a reserved IP range should be created to prevent the address namespace from being corrupted. For instance, address conflicts may occur if a node's IP assignment mode is set to 'Auto assign' in the context of an external DHCP server. See IP ranges to create such a range. It should correspond to the lease range of the external server.

https://docs.maas.io/2.6/en/installconfig-network-dhcp#external-dhcp-and-a-reserved-ip-range

For this, go to the subnets menu in the top navigation menu, and click on the subnet that you want IPs to be allocated in. Scroll down to the "Reserved" section, and add a new range using the "Reserver dynamic range" button. This will be the range that MAAS will use to assign IP addresses to the machines it deploys.

Note: This would have been great if it worked, but MAAS kept repeatedly manually assigning IP addresses outside the dynamic range that I selected. It would also ignore the IP assigned to that server by the DHCP server. That caused the servers it created to continuously create IP conflicts with my existing servers. To counter that, once a machine was comissioned (read below), I would go to the machine interfaces section, and assign a static IP to it. This way, when it was provisioned, it would get the IP that I allocated it, instead of a (semi)random one.

Another thing was how to configure your DHCP server for PXE booting servers. There are several ways to configure this. Ideally I would have wanted MAAS to run a proxyDHCP server, which would piggyback onto the existing DHCP and announce it's boot capabilities, but lacking that, we'll configure our own DHCP server to send options 66 and 67:

  • Option 66 returns the TFTP boot server - we can put the MAAS server's IP address here
  • Option 67 returns the boot file name. There are several options here. Initially I started with pxelinux.0 which I found in the old docs. But I'd highly recommend using the new EFI compatible bootx64.efi file. This will let us use secure boot on our servers.

Adding VMs to MAAS

Once our MaaS and DHCP servers are configured, we're ready to start setting up the other VMs. This should be pretty straightforward, and we'll just boot the VM with network boot enabled. In the case of Hyper-V Generation 2 VMs, this should work out of the box. If the VM doesn't boot from the local disk or CD, it will attempt to boot from the network.

You NEED to configure secure boot to use the "Microsoft UEFI Certificate Authority" to allow booting a Linux image. It is also important to allocate it a STATIC MAC address, since MAAS identifies machines by their MAC.

During the first boot, MAAS will provide a small image that will let the machine identify itself to the MAAS server. MAAS will add this VM to the Machines list, identified by it's MAC address, and allocate it a unique name. Once done, the VM will immediately shut down, so don't worry about that part.

Once the new VM is in the Machines list, you will have to configure it. You will have to go to the machine's configuration tab, and set it's Power Configuration. Since this is a Hyper-V VM, MAAS can't remotely control it's power, so we'll have to set that to Manual. I would also set the machine's name to match the VM name in Hyper-V, so that you know which machine runs on which VM.

Once done, you can use the "Take action" menu on the machine's page and "Commission" this machine. This will run another simple boot image that will run some hardware tests on the machine. The default test is smartctl-validate which will run basic hardware tests on the machine, identify it's hardware specs and available storage. I sometimes also ran the internet-connectivity test to ensure the child machines have valid network configuration and have access to internet.

Note: Since MAAS can't start the Hyper-V VMs, whenever we run any action on our machines (like Commission, Deploy, etc) we'll have to manually start the VMs, so go ahead and do that now.

Note: In some cases, MAAS failed to run the basic tests on my Generation 2 Hyper-V VMs due to a script error. This meant that they failed Comissioning tests, and weren't available for deployment. After a clean MAAS install, I managed to get them working, but in case it keeps failing you might have to switch to using Generation 1 VMs. If that is your case, there are several things you will need

  • To boot a Gen1 VM from network, you will have to use a "Legacy network adapter". Gen 1 VMs will not even attempt to boot from new network adapters.
  • Since Gen1 VMs don't support UEFI, you will have to use the pxelinux.0 boot file on the option 67 on your DHCP server.

If the comissioning succeds, you should able to see the hardware specs of your VM and the available storage. The machine's status in the machine list should be Ready. This means this machine is ready for deployment.

Juju Controller

The Juju docs say that the controller requires at least 3.5 GB of memory and 1vCPU. In my case I decided to allocate 4GB and 2 vCPUs to it.
It won't need much storage, so I left it at the default 40 GB drive.
Configure secure boot to use the "Microsoft UEFI Certificate Authority" to allow booting a Linux image

After comissioning the machine, don't forget to set a static IP address on the network interface in MAAS.

I'm restricting juju to only use machines with the juju tag, which would make it easier to force juju to pick specific machines for specific services. So before running this command, go to the machine we dedicated for the juju controller and add the juju tag to it.

To install and configure Juju, we'll need to install Juju client on a linux machine, then configure MAAS to be the cloud available to juju, and then deploy a juju controller on that cloud. This can either be a separate machine, but in my case I ran this on the MAAS server directly. We'll need the MAAS API key, which we can get in the user settings page in the MAAS web interface.

> sudo snap install juju --classic
juju 2.6.5 from Canonical✓ installed

> juju add-cloud
Since Juju 2 is being run for the first time, downloading latest cloud information.
Fetching latest public cloud list...
This client's list of public clouds is up to date, see `juju clouds --local`.
Cloud Types
  lxd
  maas
  manual
  openstack
  vsphere

Select cloud type: maas

Enter a name for your maas cloud: maas-cloud

Enter the API endpoint url: http://10.0.0.100:5240/MAAS

Cloud "maas-cloud" successfully added

You will need to add credentials for this cloud (`juju add-credential maas-cloud`)
before creating a controller (`juju bootstrap maas-cloud`).

> juju add-credential maas-cloud
Enter credential name: maas-cloud-creds

Using auth-type "oauth1".

Enter maas-oauth: 

Credential "maas-cloud-creds" added locally for cloud "maas-cloud".

Finally, when the juju is linked to our MAAS cloud, we can deploy the controller:

> juju bootstrap --constraints tags=juju maas-cloud juju-controller
Creating Juju controller "juju-controller" on maas-cloud
Looking for packaged Juju agent version 2.6.5 for amd64
Launching controller instance(s) on maas-cloud...
 - nexakn (arch=amd64 mem=4G cores=4)  
Installing Juju agent on bootstrap instance
Fetching Juju GUI 2.14.0
Waiting for address
Attempting to connect to 10.0.0.105:22
Connected to 10.0.0.105
Running machine configuration script...
Bootstrap agent now started
Contacting Juju controller at 10.0.0.105 to verify accessibility...

Bootstrap complete, controller "juju-controller" now is available
Controller machines are in the "controller" model
Initial model "default" added

Kubernetes VMs

For the Kubernetes requirements I downloaded the bundle.yml file for the Charmed Kubernetes juju charm from here: https://jaas.ai/charmed-kubernetes
Each server has a list of constraints set up. The workers require 4 cores and 4 GB ram. The master nodes will require 2 cores and 4GB ram.
The load balancer, flannel and etcd and easyrsa nodes don't mention CPU and Ram requirements, so I'll give them 2 vCPU and 2 GB ram.

In total it will set up 10 machines: 1 EasyRSA, 1 load balancer, 3 etcd, 2 master and 3 worker nodes.

As such I'll use 2 classes of VMs - high and low spec nodes. For simplicity of management, I'm adding 5 high spec nodes for the worker and master nodes, and 5 low spec nodes for the EasyRSA, etcd and load balancer nodes.

Once I set up the 10 VMs, I started them all for them to register with MAAS. Once registered, I went through each machine, cross referenced it's MAC address to populate it's machine name with the VM name, and updated it's power config.

Once finished, I ran comissioning on all of them, and went back to each machine and gave each of them a static IP address.

Generally we'd run the following to deploy Kubernetes:

> juju deploy charmed-kubernetes

In my case, I wanted to control which nodes went to which machines. Since kubernetes is deployed as a bundle, I can customise the bundle.yaml downloaded earlier. This gives me incredible amount of control over what gets deployed where. The only thing I did in my case was to add additional tags onto each service, and then assigned these tags to my machines in MAAS:

description: A highly-available, production-grade Kubernetes cluster.
series: bionic
services:
  containerd:
    charm: cs:~containers/containerd-2
    resources: {}
  easyrsa:
    annotations:
      gui-x: '450'
      gui-y: '550'
    charm: cs:~containers/easyrsa-254
    constraints: root-disk=8G tags=k8s_easyrsa
    num_units: 1
    resources:
      easyrsa: 5
  etcd:
    annotations:
      gui-x: '800'
      gui-y: '550'
    charm: cs:~containers/etcd-434
    constraints: root-disk=8G tags=k8s_etcd
    num_units: 3
    options:
      channel: 3.2/stable
    resources:
      core: 0
      etcd: 3
      snapshot: 0
  flannel:
    annotations:
      gui-x: '450'
      gui-y: '750'
    charm: cs:~containers/flannel-425
    resources:
      flannel-amd64: 323
      flannel-arm64: 319
      flannel-s390x: 306
  kubeapi-load-balancer:
    annotations:
      gui-x: '450'
      gui-y: '250'
    charm: cs:~containers/kubeapi-load-balancer-649
    constraints: root-disk=8G tags=k8s_lb
    expose: true
    num_units: 1
    resources: {}
  kubernetes-master:
    annotations:
      gui-x: '800'
      gui-y: '850'
    charm: cs:~containers/kubernetes-master-700
    constraints: cores=2 mem=4G root-disk=16G tags=k8s_master
    num_units: 2
    options:
      channel: 1.15/stable
    resources:
      cdk-addons: 0
      core: 0
      kube-apiserver: 0
      kube-controller-manager: 0
      kube-proxy: 0
      kube-scheduler: 0
      kubectl: 0
  kubernetes-worker:
    annotations:
      gui-x: '100'
      gui-y: '850'
    charm: cs:~containers/kubernetes-worker-552
    constraints: cores=4 mem=4G root-disk=16G tags=k8s_worker
    expose: true
    num_units: 3
    options:
      channel: 1.15/stable
    resources:
      cni-amd64: 322
      cni-arm64: 313
      cni-s390x: 325
      core: 0
      kube-proxy: 0
      kubectl: 0
      kubelet: 0
relations:
- - kubernetes-master:kube-api-endpoint
  - kubeapi-load-balancer:apiserver
- - kubernetes-master:loadbalancer
  - kubeapi-load-balancer:loadbalancer
- - kubernetes-master:kube-control
  - kubernetes-worker:kube-control
- - kubernetes-master:certificates
  - easyrsa:client
- - etcd:certificates
  - easyrsa:client
- - kubernetes-master:etcd
  - etcd:db
- - kubernetes-worker:certificates
  - easyrsa:client
- - kubernetes-worker:kube-api-endpoint
  - kubeapi-load-balancer:website
- - kubeapi-load-balancer:certificates
  - easyrsa:client
- - flannel:etcd
  - etcd:db
- - flannel:cni
  - kubernetes-master:cni
- - flannel:cni
  - kubernetes-worker:cni
- - containerd:containerd
  - kubernetes-worker:container-runtime
- - containerd:containerd
  - kubernetes-master:container-runtime

We deploy our bundle with the following command:

> juju deploy ./juju-bundles/kubernetes.yaml
Resolving charm: cs:~containers/containerd-2
Resolving charm: cs:~containers/easyrsa-254
Resolving charm: cs:~containers/etcd-434
Resolving charm: cs:~containers/flannel-425
Resolving charm: cs:~containers/kubeapi-load-balancer-649
Resolving charm: cs:~containers/kubernetes-master-700
Resolving charm: cs:~containers/kubernetes-worker-552
Executing changes:
- upload charm cs:~containers/containerd-2 for series bionic
- deploy application containerd on bionic using cs:~containers/containerd-2
- upload charm cs:~containers/easyrsa-254 for series bionic
- deploy application easyrsa on bionic using cs:~containers/easyrsa-254
  added resource easyrsa
- set annotations for easyrsa
- upload charm cs:~containers/etcd-434 for series bionic
- deploy application etcd on bionic using cs:~containers/etcd-434
  added resource core
  added resource etcd
  added resource snapshot
- set annotations for etcd
- upload charm cs:~containers/flannel-425 for series bionic
- deploy application flannel on bionic using cs:~containers/flannel-425
  added resource flannel-amd64
  added resource flannel-arm64
  added resource flannel-s390x
- set annotations for flannel
- upload charm cs:~containers/kubeapi-load-balancer-649 for series bionic
- deploy application kubeapi-load-balancer on bionic using cs:~containers/kubeapi-load-balancer-649
- expose kubeapi-load-balancer
- set annotations for kubeapi-load-balancer
- upload charm cs:~containers/kubernetes-master-700 for series bionic
- deploy application kubernetes-master on bionic using cs:~containers/kubernetes-master-700
  added resource cdk-addons
  added resource core
  added resource kube-apiserver
  added resource kube-controller-manager
  added resource kube-proxy
  added resource kube-scheduler
  added resource kubectl
- set annotations for kubernetes-master
- upload charm cs:~containers/kubernetes-worker-552 for series bionic
- deploy application kubernetes-worker on bionic using cs:~containers/kubernetes-worker-552
  added resource cni-amd64
  added resource cni-arm64
  added resource cni-s390x
  added resource core
  added resource kube-proxy
  added resource kubectl
  added resource kubelet
- expose kubernetes-worker
- set annotations for kubernetes-worker
- add relation kubernetes-master:kube-api-endpoint - kubeapi-load-balancer:apiserver
- add relation kubernetes-master:loadbalancer - kubeapi-load-balancer:loadbalancer
- add relation kubernetes-master:kube-control - kubernetes-worker:kube-control
- add relation kubernetes-master:certificates - easyrsa:client
- add relation etcd:certificates - easyrsa:client
- add relation kubernetes-master:etcd - etcd:db
- add relation kubernetes-worker:certificates - easyrsa:client
- add relation kubernetes-worker:kube-api-endpoint - kubeapi-load-balancer:website
- add relation kubeapi-load-balancer:certificates - easyrsa:client
- add relation flannel:etcd - etcd:db
- add relation flannel:cni - kubernetes-master:cni
- add relation flannel:cni - kubernetes-worker:cni
- add relation containerd:containerd - kubernetes-worker:container-runtime
- add relation containerd:containerd - kubernetes-master:container-runtime
- add unit easyrsa/0 to new machine 0
- add unit etcd/0 to new machine 1
- add unit etcd/1 to new machine 2
- add unit etcd/2 to new machine 3
- add unit kubeapi-load-balancer/0 to new machine 4
- add unit kubernetes-master/0 to new machine 5
- add unit kubernetes-master/1 to new machine 6
- add unit kubernetes-worker/0 to new machine 7
- add unit kubernetes-worker/1 to new machine 8
- add unit kubernetes-worker/2 to new machine 9
Deploy of bundle completed.

This will take a while, and you can see the progress by calling

> juju status

You can also monitor the progress by using the watch command, which will refresh the status every 2 seconds:

> watch -c juju status --color

Once the deployment is finished, we can install kubectl so that we can control our kubernetes cluster. To do this we'll install the kubectl snap, and copy the kubernetes config file to our machine using juju.

> sudo snap install kubectl --classic
[sudo] password for artiom:
kubectl 1.15.0 from Canonical✓ installed

> mkdir ~/.kube
> juju scp kubernetes-master/0:config ~/.kube/config

We can confirm that we now have control of kubernetes by getting the cluster info:

> kubectl cluster-info
Kubernetes master is running at https://10.0.0.119:443
Heapster is running at https://10.0.0.119:443/api/v1/namespaces/kube-system/services/heapster/proxy
CoreDNS is running at https://10.0.0.119:443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://10.0.0.119:443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
Grafana is running at https://10.0.0.119:443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
InfluxDB is running at https://10.0.0.119:443/api/v1/namespaces/kube-system/services/monitoring-influxdb:http/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Ceph servers

To provide persistent and redundant storage to our cluster, I followed the kubernetes docs (https://ubuntu.com/kubernetes/docs/storage) and installed Ceph

According to Ceph documentation (http://docs.ceph.com/docs/jewel/start/hardware-recommendations/), I should be able to easily get away with using low class machines for both the monitor and the OSD Ceph instances. If needed, I could always add some more CPU cores to them.

Ceph will require at least 3 monitor and 3 OSD instances, so I'll have to create 6 VMs for it. Follow the same process as above - create VMs, start them up to register them in MAAS, configure the power config, commission and assign static IPs to make them ready for deployment.

You can see the details of the ceph charms here:

As with Kubernetes, I'd want to pick the machines that will be used to deploy these services, so I'll tag them appropriately. In this case, I'll use the following commands:

> juju deploy -n 3 --constraints tags=ceph-mon ceph-mon
Located charm "cs:ceph-mon-38".
Deploying charm "cs:ceph-mon-38".
> juju deploy -n 3 --constraints tags=ceph-osd ceph-osd --storage osd-devices=250G --storage osd-journals=30G
Located charm "cs:ceph-osd-285".
Deploying charm "cs:ceph-osd-285".

We the link the monitors and OSD deployments together

juju add-relation ceph-osd ceph-mon

To let Kubernetes be aware of Ceph, we'll have to link them too:

juju add-relation ceph-mon:admin kubernetes-master
juju add-relation ceph-mon:client kubernetes-master

Kubernetes will create the required storage pools automatically, so unless we want to customize them, we can leave it as it is. We'll just monitor the status in juju until kubernetes finished configuring the storage.

You can confirm that the storage is configured and available by querying both juju and kubectl for available storage.

> juju storage
Unit        Storage id      Type   Pool  Size    Status    Message
ceph-osd/0  osd-devices/0   block  maas  250GiB  attached  
ceph-osd/0  osd-journals/1  block  maas  40GiB   attached  
ceph-osd/1  osd-devices/2   block  maas  250GiB  attached  
ceph-osd/1  osd-journals/3  block  maas  40GiB   attached  
ceph-osd/2  osd-devices/4   block  maas  250GiB  attached  
ceph-osd/2  osd-journals/5  block  maas  40GiB   attached  

> kubectl get sc,po
NAME                                             PROVISIONER        AGE
storageclass.storage.k8s.io/ceph-ext4            rbd.csi.ceph.com   4m29s
storageclass.storage.k8s.io/ceph-xfs (default)   rbd.csi.ceph.com   4m29s

NAME                              READY   STATUS    RESTARTS   AGE
pod/csi-rbdplugin-attacher-0      1/1     Running   1          4m29s
pod/csi-rbdplugin-b2sp5           2/2     Running   1          4m29s
pod/csi-rbdplugin-lksvs           2/2     Running   1          4m29s
pod/csi-rbdplugin-provisioner-0   3/3     Running   1          4m29s
pod/csi-rbdplugin-q5dx9           2/2     Running   1          4m29s

Testing

To make sure the kubernetes environment is running, I ran the microbot example on the cluster. You can spin up 3 replicas of the microbots with the following command:

> juju run-action kubernetes-worker/0 microbot replicas=3 --wait
unit-kubernetes-worker-0:
  id: 609c4717-4840-4bac-8c2e-ae8bea1e5223
  results:
    address: microbot.10.0.0.111.xip.io
  status: completed
  timing:
    completed: 2019-07-16 12:03:39 +0000 UTC
    enqueued: 2019-07-16 12:03:34 +0000 UTC
    started: 2019-07-16 12:03:35 +0000 UTC
  unit: kubernetes-worker/0

You can confirm that it's spun up it's pods, services and endpoints (I'm only listing the microbot ones below):

> kubectl get pods,services,endpoints
NAME                          READY   STATUS    RESTARTS   AGE
microbot-847567478c-9542c     1/1     Running   0          68s
microbot-847567478c-kxjdp     1/1     Running   0          68s
microbot-847567478c-zll8l     1/1     Running   0          68s

NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/microbot                    ClusterIP   10.152.183.175   <none>        80/TCP      73s

NAME                                  ENDPOINTS                                AGE
endpoints/microbot                    10.1.16.4:80,10.1.38.7:80,10.1.76.8:80   73s

The microbots service also creates an ingress resource pointing toa xip.io domain, which will let you access the microbots service, confirming they're running

> kubectl get ingress
NAME               HOSTS                          ADDRESS   PORTS   AGE
microbot-ingress   microbot.10.0.0.111.xip.io             80      90s

Finally we can delete the microbot deployment with the following:

> juju run-action kubernetes-worker/0 microbot delete=true

You can also access the Kubernetes dashboard at this point. On a machine running kubectl with a valid config file run

kubectl proxy

While this command is running, you will be able to access the dashboard at the following url: http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

As the documentation states:

You will need to log in to the Dashboard with a valid user. The easiest thing to do is to select your kubeconfig file, but for future administration, you should set up role based access control.


Resources