Kubernetes – UseIT | Roman Levchenko

If your pipelines suddenly started failing with an error like:

Error response from daemon: client version 1.40 is too old. 
Minimum supported API version is 1.44

—even though you didn’t change anything, this is not a CI issue and not a random failure. Some breaking changes have been introduced recently.

Let’s say you have a Docker-in-Docker setup in GitLab CI, for example:

image: docker:stable # gets the latest stable 
services:
  - docker:dind # pulls the latest dind version
script:
  - docker login/build/push ...

..and this have worked for years

The problem is that docker:stable has not actually been updated for a long time. It’s effective version is Docker 19.03.14.

The docker:stable, docker:test, and related “channel” tags have been deprecated since June 2020 and have not been updated since December 2020 (when Docker 20.10 was released)

At the same time, docker:dind is actively updated and may now be running Docker 29.2.0 (as of February 1, 2026).

The docker login command is executed inside the job container (docker:stable), which contains the Docker CLI. That CLI sends requests to the Docker daemon running in the docker:dind service. With this version mismatch, the request now fails. Why?

Starting with Docker Engine 29, the Docker daemon enforces a minimum supported Docker API version and drops support for older clients entirely. This is a real breaking change, and it has a significant impact on CI systems — especially GitLab CI setups using the Docker executor with docker:dind.

The daemon now requires API version v1.44 or later (Docker v25.0+).

This would not have been an issue if best practices had been followed. GitLab documentation (and many other sources) clearly states:

You should always pin a specific version of the image, like docker:24.0.5. If you use a tag like docker:latest, you have no control over which version is used. This can cause incompatibility problems when new versions are released.

Another case illustrating why you should not use latest or any other tag that doesn’t allow you to control which version is used.

Solution 1 – use specific versions (25+ in this case; recommended)

image: docker:29.2.0 # or docker:29.2.0-cli which is lighter
services:
  - docker:29.2.0-dind
script:
  - docker login/build/push ...

Solution 2– set docker min api version for daemon:

  services:
    - name: docker:dind
      variables:
        DOCKER_MIN_API_VERSION: "1.40"

Solution 3 – change daemon.json with min api version:

{
  "min-api-version": "1.40"
}

1.40 represents docker:stable API version; in theory, Docker can break something again, so solution 1 is always preferable (for me, at least)

Hope it was helpful to someone.

Recently, I encountered and resolved several issues that were causing job failures and instability in AWS-based GitLab Runner setup on Kubernetes. In this post, I’ll walk through the errors, their causes, and the solutions that worked for me.

Job Timeout While Waiting for Pod to Start

ERROR: Job failed (system failure): prepare environment: waiting for pod running: timed out waiting for pod to start

Adjusting poll_interval and poll_timeout helped resolve the issue:

poll_timeout (default: 180s) – The maximum time the runner waits before timing out while connecting to a newly created pod.
poll_interval (default: 3s) – How frequently the runner checks the pod’s status.

By increasing poll_timeout (180s to 360s), the runner allowed more time for the pod to start, preventing such failures.

ErrImagePull: pull QPS exceeded

When the GitLab Runner starts multiple jobs that require pulling the same images (e.g., for services and builders), it can exceed the kubelet’s default pull rate limits:

registryPullQPS (default: 5) – Limits the number of image pulls per second.
registryBurst (default: 10) – Allows temporary bursts above registryPullQPS.

Instead of modifying kubelet parameters, I resolved this issue by changing the runner’s image pull policy from always to if-not-present to prevent unnecessary pulls:

pull_policy = ["if-not-present"] # default one
allowed_pull_policies = ["always", "if-not-present"] # allow to set always from pipeline if necessary

TLS Error When Preparing Environment

ERROR: Job failed (system failure): prepare environment: setting up trapping scripts on emptyDir: error dialing backend: remote error: tls: internal error

GitLab Runner communicates securely with the Kubernetes API to create executor pods. A TLS failure can occur due to API slowness, network issues, or misconfigured certificates

Setting the feature flag FF_WAIT_FOR_POD_TO_BE_REACHABLE to true helped resolve the issue by ensuring that the runner waits until the pod is fully reachable before proceeding. This can be set in the GitLab Runner configuration:

[runners.feature_flags]
  FF_WAIT_FOR_POD_TO_BE_REACHABLE = true

DNS Timeouts

dial tcp: lookup on <coredns ip>:53: read udp i/o timeout

While CoreDNS logs and network communication appeared normal, there was an unexpected spike in DNS load after launching more GitLab jobs than usual.

Scaling the CoreDNS deployment resolved the issue. Ideally, enabling automatic DNS horizontal autoscaling is preferred for handling load variations (check out kubernetes docs or cloud provider’s specific solution: they all share the same approach – add more replicas if increased load occurs)

kubectl scale deployments -n kube-system coredns --replicas=4

If you encountered other GitLab Runner issues, share them in comments 🙂

Cheers!