Working towards the solution, we had a couple of constraints. CI/CD minutes on shared runners that we wanted to use as much as possible. You don’t like throwing away money, do you? Also, we were getting close to maxing out this monthly quota, in which case we would have needed to buy additional shared runner minutes. Having read the section above, you might understand why we were reluctant to do that. Any new solution had to be cost-efficient, so setting up an old-fashioned build server on AWS that idles 80% of the time was out of the question for us.
Looking at the Gitlab docs for runners there are a couple of options for setting up runners, leading us to the following decisions:
- Kubernetes vs plain EC2 instance: At that time, we were about to make our first steps towards K8S, but not running any workloads on it yet. If you are more familiar with it than we were at the time, choosing it might save you a fair bit of infrastructure work. We weren’t, so we went for EC2.
- Autoscaling: there are three possible setups here, EC2 autoscaling using docker machine, Fargate autoscaling using Gitlab’s custom executor, or no scaling at all. EC2 autoscaling uses docker features to spin up new instances dynamically while Fargate autoscaling relies on, well, AWS Fargate tasks. However it has a major limitation: When setting up a Fargate runner, you need to specify a Docker image to be used, or, in other words, each runner only works with a specific image. Each time you want to update an image or use a new image for a job, you would normally simply update the versioned pipeline config, right? With Fargate, you would have to connect to the EC2 instance running the Fargate executor and change the image there! This may be fine if you only have a small fixed number of images that you update twice a year, but for larger organisations this will be hard to maintain, especially considering access management for deployed instances. Still, as we don’t want to keep running a build-server-sized instance all the time, we needed autoscaling to spin them up dynamically, so we went with EC2 autoscaling.
Through experimentation, we found out that the sweet spot for building our complex applications were c5.xlarge instances with 4 cores and 8 GB of memory. Most jobs, such as deployment jobs using Terraform or Ansible, or even smaller Typescript builds did not benefit significantly from anything above t3a.small instances. For the instance running the Docker machine executor, which we will simply refer to as Runner Manager in the following, we used tiny t2.micro instances.
Have you had similar experiences with Gitlab Runners? Any questions? Connect and write to me on LinkedIn!