• bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10
  • bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10
  • bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10
  • bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10
  • bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10
  • bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10
  • bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10
  • bids: 0 +10
  • bidders: 0 +10
  • completed auctions: 0 +10
  • next fall of the hammer: 0 ending now
  • sold lots: 0 +10

12.04.2021

Our approach to Gitlab runners

Until recently our engineering team was exclusively using Gitlab shared runners for our CI/CD pipelines. While they did (and partly still do) get the job done, recent outages have made us aware of how critical functioning CI/CD is to us in our day-to-day work.

Part 1 of a series of articles

Don’t get me wrong, we love Gitlab, and for the most part, we are completely happy with it. We value their commitment to transparency, especially when it comes to production incidents.

What we wanted to improve

Reliability: In the first quarter of 2021, Gitlab suffered multiple incidents resulting in degraded service levels for shared runners [see: 1, 2]. This not only brought our development process to a halt but also endangered our production deployments.

Availability: Even while no active incident was reported by Gitlab, we observed on multiple occasions that jobs were stuck in a pending state (i.e. waiting for a runner) for up to 10 minutes. Pipeline speed is an important CI/CD KPI for us and we try to optimise for it, e.g. through means of parallel job execution. Not only are fast pipelines important for developers to quickly get feedback for changes, but also there is a business impact that we need to consider, as some of our legacy applications can not be deployed without downtime.

Performance:  We have two applications that stand out from the other services in terms of high complexity and slow build speed. One is a Java back-end and the other one is an Angular front-end. While those builds were not normally fast, we observed that the build was a lot faster on our local development machine than in the pipeline.

The solution

Stack

Working towards the solution, we had a couple of constraints. Our Gitlab plan includes a certain amount of CI/CD minutes on shared runners that we wanted to use as much as possible. You don’t like throwing away money, do you? Also, we were getting close to maxing out this monthly quota, in which case we would have needed to buy additional shared runner minutes. Having read the section above, you might understand why we were reluctant to do that. Any new solution had to be cost-efficient, so setting up an old-fashioned build server on AWS that idles 80% of the time was out of the question for us.

Looking at the Gitlab docs for runners there are a couple of options for setting up runners, leading us to the following decisions:

  • Kubernetes vs plain EC2 instance: At that time, we were about to make our first steps towards K8S, but not running any workloads on it yet. If you are more familiar with it than we were at the time, choosing it might save you a fair bit of infrastructure work. We weren’t, so we went for EC2.
  • Autoscaling: there are three possible setups here, EC2 autoscaling using docker machine, Fargate autoscaling using Gitlab’s custom executor, or no scaling at all. EC2 autoscaling uses docker features to spin up new instances dynamically while Fargate autoscaling relies on, well, AWS Fargate tasks. However it has a major limitation: When setting up a Fargate runner, you need to specify a Docker image to be used, or, in other words, each runner only works with a specific image. Each time you want to update an image or use a new image for a job, you would normally simply update the versioned pipeline config, right? With Fargate, you would have to connect to the EC2 instance running the Fargate executor and change the image there! This may be fine if you only have a small fixed number of images that you update twice a year, but for larger organisations this will be hard to maintain, especially considering access management for deployed instances. Still, as we don’t want to keep running a build-server-sized instance all the time, we needed autoscaling to spin them up dynamically, so we went with EC2 autoscaling.

Instance Sizing

Through experimentation, we found out that the sweet spot for building our complex applications were c5.xlarge instances with 4 cores and 8 GB of memory. Most jobs, such as deployment jobs using Terraform or Ansible, or even smaller Typescript builds did not benefit significantly from anything above t3a.small instances. For the instance running the Docker machine executor, which we will simply refer to as Runner Manager in the following, we used tiny t2.micro instances.

That’s it for the first part of this series. You now know about our experience with shared runners and the reasons why we don’t want to depend on them. Next time, we’ll have a close look at how to install Gitlab runner on EC2, configure it to autoscale, and use runners in a pipeline. I’ll share a quick cost review so that you can match your options to a budget.

Have you had similar experiences with Gitlab Runners? Any questions? Connect and write to me on LinkedIn!

Next up:

  • Manually installing and configuring Gitlab runners
  • Monitoring Gitlab runners in Elastic
  • Automating deployment of Gitlab runners
Article by
Pascal Luckhaus

Pascal is a DevOps engineer at AURENA Tech. He accompanies our services through all lifecycle stages: he implements CI/CD processes, automates infrastructure, and ensures reliability and observability during operation.

More articles

09.08.2021

Hey Robot, run my tests!

Learn how we run test suites on-demand from Gitlab CI, and how we trigger them from different pipelines and from Slack.

01.07.2021

Team activities: Racing Experience

The whole AURENA team enjoyed a special in-person event at the Red Bull race track.

03.05.2021

Smoke Testing with Jest and GraphQL

One important capability in our QA strategy is to run smoke tests against remote service endpoints, to check that they are healthy. In this article, we share our approach, walking you through some examples.

23.09.2020

Welcome aboard: Meet QA lead Tatjana Statescu

QA is a cornerstone to ship reliably working products that exceed user expectations. Tatjana is going to build up our competence center to bring quality assurance at AURENA Tech to the next level.

27.04.2020

IaC: AWS CDK vs. Terraform

If you are interested in the difference between IaC tools, check out our comparison of AWS CDK and HashiCorp Terraform.

Open positions

AURENA.tech
text
Product Manager with UX background (f/m/x)

Complement our team by managing product evolution from scratch, as well as enhancing existing solutions along the entire user journey.

  • Leoben (partially remote)
  • Fulltime, permanent
  • Starts at € 65,000 p.a.
AURENA.tech
text
Senior Angular Developer (f/m/x)

Develop new features for our leading real-time auction platform and build new apps from scratch.

  • Leoben or fully remote
  • Fulltime, permanent
  • Starts at € 55,300 p.a.
AURENA.tech
text
Senior Mobile App Developer (f/m/x)

Develop cross-platform native apps and progressive web apps from scratch.

  • Leoben or fully remote
  • Fulltime, permanent
  • Starts at € 55,300 p.a.
AURENA.tech
text
Junior DevOps Engineer (f/m/x)

Design infrastructure for new micro-services, mange operations and advance our internal tooling.

  • Leoben or fully remote
  • Fulltime, permanent
  • Starts at € 44,800 p.a.