Context
To give some brief context, we had several tightly coupled microservices — to the point where deploying a new version of a particular microservice would require others to be patched as well. This kind of tight coupling is sometimes described as a “distributed monolith”. Regardless, in this scenario we wanted to achieve a seamless zero-downtime deployment, given that we hosted the application in a Kubernetes cluster.
Challenge
Traditionally, blue/green strategies rely on deploying two versions of the same service, such as live and preview — as seen in Argo Rollouts. However, this wasn’t compatible with our case, as we needed to deploy an entire stack, which could include some services at newer versions while others remained at their current version, according to the release plan.
Naturally, Kubernetes namespaces offered a way to build two distinct isolated environments. The challenge could be divided into two parts. First, creating a pipeline capable of coordinating the deployment of the entire application stack. Second, having this pipeline alternate which namespace served the preview and live versions of the application.
Solution
The repository was structured as follows:
├── apps/ # Application stack definitions (Helm umbrella charts)
├── argo/ # Argo CD App of Apps manifests
├── charts/ # Individual Helm charts for each component
└── .gitlab-ci.yml # Deployment pipeline
Part 1
To tackle the first part of the challenge, we opted to introduce Argo CD and make use of the App of Apps pattern.
By creating the following Argo application:
# argo/app-of-apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app-blue
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
destination:
namespace: my-namespace-blue
server: https://kubernetes.default.svc
source:
path: apps/my-app
repoURL: ${MY_REPO}
targetRevision: main
helm:
valueFiles:
- values-blue.yaml
project: default
syncPolicy: {}
A near-identical Argo application is deployed for the green version, replacing “blue” with “green” wherever applicable.
This application watches for changes in the stack’s manifest, allowing the coordination of an entire stack. The respective application components are defined as chart dependencies:
# apps/my-app/my-app.yaml
apiVersion: v2
name: my-app-stack
description: my-app application stack chart
type: application
version: 0.1.0
dependencies:
- name: my-frontend
version: 0.1.0
repository: "file://../../charts/my-app/my-frontend"
- name: my-backend
version: 0.1.0
repository: "file://../../charts/my-app/my-backend"
#...
# apps/my-app/values-blue.yaml
global:
live: blue
replicas: 1
This setup enabled us to go into charts/my-app/my-frontend, and any other component, and change its values file to use the desired release version. For example:
# charts/my-app/my-frontend/values.yaml
image:
name: ${MY_IMAGE_NAME}
tag: 2.21.1 # new version
Finally, a pipeline handles the coordinated deployment of the stack:
# .gitlab-ci.yml
deploy:
stage: deploy
when: manual
image: alpine:latest
variables:
GIT_STRATEGY: clone
GIT_DEPTH: 1
script:
- echo "Live color is ${LIVE_COLOR}, preview color is ${PREVIEW_COLOR}"
- CURRENT_REPLICAS=$(yq '.global.replicas' "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml")
- CURRENT_ENV=$(yq '.global.environment' "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml")
- |
if [ "$CURRENT_REPLICAS" != "1" ]; then
yq -i '.global.replicas = 1' "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml"
echo "Updated values-${PREVIEW_COLOR}.yaml replicas to 1"
git add "apps/${APP_NAME}/values-${PREVIEW_COLOR}.yaml"
git commit -m "Scale up ${PREVIEW_COLOR} ${APP_NAME} deployment [skip ci]"
git push origin HEAD:main
else
echo "No changes needed, skipping commit"
fi
- argocd app sync "${APP_NAME}-${PREVIEW_COLOR}" --grpc-web
- argocd app wait "${APP_NAME}-${PREVIEW_COLOR}" --sync --health --timeout 600 --grpc-web
- echo "Deploy stage complete. Preview (${PREVIEW_COLOR}) is now scaled up and synced."
Part 2
By also having Argo manage Istio objects, we could control which namespace served live and/or preview traffic:
# charts/ingress/my-app-ingress.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-app-live-vsvc
namespace: default
spec:
hosts:
- ${MY_ADDRESS}
gateways:
- external-ingress-gateway
http:
- route:
- destination:
host: my-frontend-service.my-namespace-blue.svc.cluster.local
port:
number: 80
{{ if eq .Values.global.live "blue" }}
weight: 100
{{ else if eq .Values.global.live "green" }}
weight: 0
{{ end }}
- destination:
host: my-frontend-service.my-namespace-green.svc.cluster.local
port:
number: 80
{{ if eq .Values.global.live "green" }}
weight: 100
{{ else if eq .Values.global.live "blue" }}
weight: 0
{{ end }}
In order to alternate the desired live and preview state across the simultaneous blue and green versions, we had a pipeline in the Argo project repository to coordinate when to sync with the latest state:
# .gitlab-ci.yml
swap:
stage: swap
when: manual
image: alpine:latest
script:
- echo "Swapping live from ${LIVE_COLOR} to ${PREVIEW_COLOR}"
- CURRENT_COLOR=$(yq ".global.${APP_LIVE_COLOR_VAR_NAME}" apps/ingress/values.yaml)
- |
if [ "$CURRENT_COLOR" = "$PREVIEW_COLOR" ]; then
echo "${APP_LIVE_COLOR_VAR_NAME} already set to ${PREVIEW_COLOR}, skipping commit"
else
yq -i ".global.${APP_LIVE_COLOR_VAR_NAME} = \"${PREVIEW_COLOR}\"" apps/ingress/values.yaml
echo "Updated ${APP_LIVE_COLOR_VAR_NAME} to ${PREVIEW_COLOR}"
git add apps/ingress/values.yaml
git commit -m "Swap live traffic to ${PREVIEW_COLOR} [skip ci]"
git push origin HEAD:main
fi
- argocd app sync ingress --grpc-web
- argocd app wait ingress --sync --health --timeout 120 --grpc-web
- echo "Swap complete. Live is now ${PREVIEW_COLOR}."
The pipeline steps were as follows:
# .gitlab-ci.yml
stages:
- init # determine which blue or green should be launched to serve preview, and which is currently serving live traffic
- deploy
- swap
# ... revert and scale down future stages
Notes
While this solution worked, there were simpler alternatives. Using native Kubernetes services over introducing Istio, we would also be capable of achieving cross-namespace communication through proxies, making use of ExternalName services. This avoids the complexity of a service mesh, as we were not taking advantage of its features, such as weighted traffic shifting and canary capabilities.
apiVersion: v1
kind: Service
metadata:
name: my-service-proxy
namespace: my-namespace-blue
spec:
type: ExternalName
externalName: my-service.my-namespace-{{ .Values.global.liveNamespace }}.svc.cluster.local
ports:
- port: {{ .Values.global.appPort }}