Context

To give some brief context, we had several tightly coupled microservices — to the point where deploying a new version of a particular microservice would require others to be patched as well. This kind of tight coupling is sometimes described as a “distributed monolith”. Regardless, in this scenario we wanted to achieve a seamless zero-downtime deployment, given that we hosted the application in a Kubernetes cluster.

Challenge

Traditionally, blue/green strategies rely on deploying two versions of the same service, such as live and preview — as seen in Argo Rollouts. However, this wasn’t compatible with our case, as we needed to deploy an entire stack, which could include some services at newer versions while others remained at their current version, according to the release plan.

Naturally, Kubernetes namespaces offered a way to build two distinct isolated environments. The challenge could be divided into two parts. First, creating a pipeline capable of coordinating the deployment of the entire application stack. Second, having this pipeline alternate which namespace served the preview and live versions of the application.

Solution

The repository was structured as follows:

├── apps/          # Application stack definitions (Helm umbrella charts)
├── argo/          # Argo CD App of Apps manifests
├── charts/        # Individual Helm charts for each component
└── .gitlab-ci.yml # Deployment pipeline

Part 1

To tackle the first part of the challenge, we opted to introduce Argo CD and make use of the App of Apps pattern.

By creating the following Argo application:

# argo/app-of-apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app-blue
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: my-namespace-blue
    server: https://kubernetes.default.svc
  source:
    path: apps/my-app
    repoURL: ${MY_REPO}
    targetRevision: main
    helm:
      valueFiles:
        - values-blue.yaml
  project: default
  syncPolicy: {}

A near-identical Argo application is deployed for the green version, replacing “blue” with “green” wherever applicable.

This application watches for changes in the stack’s manifest, allowing the coordination of an entire stack. The respective application components are defined as chart dependencies:

# apps/my-app/my-app.yaml
apiVersion: v2
name: my-app-stack
description: my-app application stack chart
type: application
version: 0.1.0
dependencies:
  - name: my-frontend
    version: 0.1.0
    repository: "file://../../charts/my-app/my-frontend"
  - name: my-backend
    version: 0.1.0
    repository: "file://../../charts/my-app/my-backend"
  #...
# apps/my-app/values-blue.yaml
global:
  live: blue
  replicas: 1

This setup enabled us to go into charts/my-app/my-frontend, and any other component, and change its values file to use the desired release version. For example:

# charts/my-app/my-frontend/values.yaml
image:
  name: ${MY_IMAGE_NAME}
  tag: 2.21.1 # new version

Finally, a pipeline handles the coordinated deployment of the stack:

# .gitlab-ci.yml 
deploy:
  stage: deploy
  when: manual
  image: alpine:latest
  variables:
    GIT_STRATEGY: clone
    GIT_DEPTH: 1
  script:
    - echo "Live color is ${LIVE_COLOR}, preview color is ${PREVIEW_COLOR}"
    - CURRENT_REPLICAS=$(yq '.global.replicas' "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml")
    - CURRENT_ENV=$(yq '.global.environment' "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml")
    - |
      if [ "$CURRENT_REPLICAS" != "1" ]; then
        yq -i '.global.replicas = 1' "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml"
        echo "Updated values-${PREVIEW_COLOR}.yaml replicas to 1"
        git add "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml"
        git commit -m "Scale up ${PREVIEW_COLOR} ${APP_NAME} deployment [skip ci]"
        git push origin HEAD:main
      else
        echo "No changes needed, skipping commit"
      fi
    - argocd app sync "${APP_NAME}-${PREVIEW_COLOR}" --grpc-web
    - argocd app wait "${APP_NAME}-${PREVIEW_COLOR}" --sync --health --timeout 600 --grpc-web
    - echo "Deploy stage complete. Preview (${PREVIEW_COLOR}) is now scaled up and synced."

Part 2

By also having Argo manage Istio objects, we could control which namespace served live and/or preview traffic:

# charts/ingress/my-app-ingress.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-app-live-vsvc
  namespace: default
spec:
  hosts:
    - ${MY_ADDRESS}
  gateways:
    - external-ingress-gateway
  http:
    - route:
        - destination:
            host: my-frontend-service.my-namespace-blue.svc.cluster.local
            port:
              number: 80
          {{ if eq .Values.global.live "blue" }}
          weight: 100
          {{ else if eq .Values.global.live "green" }}
          weight: 0
          {{ end }}
        - destination:
            host: my-frontend-service.my-namespace-green.svc.cluster.local
            port:
              number: 80
          {{ if eq .Values.global.live "green" }}
          weight: 100
          {{ else if eq .Values.global.live "blue" }}
          weight: 0
          {{ end }}

In order to alternate the desired live and preview state across the simultaneous blue and green versions, we had a pipeline in the Argo project repository to coordinate when to sync with the latest state:

# .gitlab-ci.yml 
swap:
  stage: swap
  when: manual
  image: alpine:latest
  script:
    - echo "Swapping live from ${LIVE_COLOR} to ${PREVIEW_COLOR}"
    - CURRENT_COLOR=$(yq ".global.${APP_LIVE_COLOR_VAR_NAME}" apps/ingress/values.yaml)
    - |
      if [ "$CURRENT_COLOR" = "$PREVIEW_COLOR" ]; then
        echo "${APP_LIVE_COLOR_VAR_NAME} already set to ${PREVIEW_COLOR}, skipping commit"
      else
        yq -i ".global.${APP_LIVE_COLOR_VAR_NAME} = \"${PREVIEW_COLOR}\"" apps/ingress/values.yaml
        echo "Updated ${APP_LIVE_COLOR_VAR_NAME} to ${PREVIEW_COLOR}"
        git add apps/ingress/values.yaml
        git commit -m "Swap live traffic to ${PREVIEW_COLOR} [skip ci]"
        git push origin HEAD:main
      fi
    - argocd app sync ingress --grpc-web
    - argocd app wait ingress --sync --health --timeout 120 --grpc-web
    - echo "Swap complete. Live is now ${PREVIEW_COLOR}."

The pipeline steps were as follows:

# .gitlab-ci.yml 
stages:
  - init # determine which blue or green should be launched to serve preview, and which is currently serving live traffic
  - deploy
  - swap
# ... revert and scale down future stages

Notes

While this solution worked, there were simpler alternatives. Using native Kubernetes services over introducing Istio, we would also be capable of achieving cross-namespace communication through proxies, making use of ExternalName services. This avoids the complexity of a service mesh, as we were not taking advantage of its features, such as weighted traffic shifting and canary capabilities.

apiVersion: v1
kind: Service
metadata:
  name: my-service-proxy
  namespace: my-namespace-blue
spec:
  type: ExternalName
  externalName: my-service.my-namespace-{{ .Values.global.liveNamespace }}.svc.cluster.local
  ports:
  - port: {{ .Values.global.appPort }}