Multi-repo GKE Config sync with multi-cluster Anthos service mesh

Google cloud has a great product called Anthos. Its best feature is able to span the Kubernetes engine cluster to other cloud providers or to VMWare vSphere in a data center on premises. The licensing fee costs a few thousand dollars monthly per project. If you don’t need multi-cloud or don’t have a VMWare vSphere on premises, this article shows you how to configure a GKE cluster with Istio service mesh (comparable to Anthos service mesh) and GKE config sync (comparable to Anthos config management).

Configuration: cluster creation

The steps are derived from Syncing from multiple repositories with GKE workload identity as the authentication method to the git repository during config sync operator deployment. First, create a GKE cluster with the Istio feature. Fill the environment variables in the following script and execute as project editor.

Upon successful cluster creation, follow the 2 steps to deploy the config sync operator. You are expecting the following command line output

Configuration: ConfigManagement

Follow the steps in Configuring syncing from the root repository, execute kubectl apply -f config-management-multi-repo.yaml on the following content with the correct .

Upon success, the cluster’s name specified in the yaml needs to match kind: ClusterSelector’s metadata.name in git repository’s clusterregistry folder.

Configuration: RootSync for cluster resources

Next, apply the following to synchronize cluster scoped resources to the git repository. The repository name is ; branch is master; the directory is . Notice the is commented as we are about to use GKE workload identity for the git authentication. Set the correct .

The root-reconciler deployment’s git-sync container immediately has the errors in Cloud Logging.

The solution is to create a Google service account such as gke-git-sync; bind Source Repository Reader IAM role in the gke-config Repository; execute the following command to enable GKE workload identity. Observe that deployment of root-reconciler has serviceAccountName: root-reconciler. If you are configuring more than 1 cluster in the same project, the gcloud iam command should have the roles/iam.workloadIdentityUser binding created during 1st cluster’s configurations. Subsequent cluster’s configuration can just execute the kubectl annotate command although re-executing gcloud iam command should not hurt.

Within 2 minutes, observe the following log from the root-reconciler deployment’s git-sync container as a sign of success.

If the git repository has any cluster scoped resources such as the ClusterRole or the namespace, verify they were created.

Configuration: RepoSync for namespace scoped resources

Following the instructions of Configuring syncing from namespace repositorie, in the namespaces folder, create a namespace containing the following 3 files. In the example below, the namespace is .The repo-sync has git pointing to the same repository, gke-config, but a different branch ’s HEAD.

Upon pushing the commit to the master branch, a deployment in the format of gets created and contains a container git-sync showing the following error in Cloud Logging every 15 seconds, very similar to the prior error in root-reconciler deployment’s git-sync container.

The reason is lack of GKE workload identity configured in the Kubernetes service account created for the namespace repo sync. Execute the following commands to enable it. Again, subsequent cluster’s configuration can just execute the kubectl annotate command although re-executing gcloud iam command should not hurt.

Within 2 minutes, observe the errors stopped and info level logs appear in Cloud Logging.

If you observe the following error in the root-reconciler-* pod’s reconciler container, some constraint templates have not been installed.

Verify any namespace scoped resources, such as the KSA, were created. Look for the Open Policy Agent gatekeeper section below to install them.

ClusterSelector works only in the root repository

The current limitations show that ClusterSelectors only work in the root repository. The namespace-reader ClusterRole is an example of creating a cluster resource in the selected clusters. Replace $GKE_NAME-1 with the config-management’s clusterName. The cluster selector is selecting clusters with labels of num: “1” AND customer: hil which is the $GKE_NAME-1 cluster. To effectively test ClusterSelectors, create another cluster $GKE_NAME-2 and configure with the steps above. Observe that the 2nd cluster does not have The namespace-reader ClusterRole but have namespace . The reason is namespace hil does not have any annotation: configmanagement.gke.io/cluster-selector.

Open Policy Agent gatekeeper as a Policy Controller

Anthos has a policy controller for cluster administrators to define constraints based on existing constraint templates Google provides. The Open policy controller gatekeeper achieves a similar goal. With a project editor IAM role binding, I executed the command from the README page to install the gatekeeper. Then I install the container resource limit template.

Ideally, push a commit of the constraint to the RootSync’s branch at directory per Google’s constraint format. You’d expect the constraint to be created in the selected cluster. Administrators can also execute kubectl apply -f command to install the constraints for experimental purposes. Verify the constraints with the command:

Creating the following pod in the namespace will be rejected but not in the default namespace:

Expect Error

Multi-cluster service mesh on GKE with shared control-plane, single-VPC architecture

Similar to multi-cluster ingress in GKE, it’s possible to create a new GKE cluster to connect to an existing GKE cluster’s Istio control plane as described in Building a multi-cluster service mesh on GKE with shared control-plane, single-VPC architecture for Istio 1.4. While Kubernetes services can be deployed to different clusters, the virtual service points to hostnames in different clusters.

Problems with the Istio 1.4 multi-cluster service mesh

Let’s call the GKE cluster with istio control plan the control cluster and the other GKE cluster the remote cluster. There are problems:

  1. A deployment in the remote cluster can’t resolve a Kubernetes service in the control cluster at $SVC-name.$NAMESPACE.svc.cluster.local. I guess it makes sense as both clusters could have namespaces of identical names.
  2. As pod IP addresses of istio-pilot, istio-policy, istio-telemetry change during cluster autoscaling, the remote cluster’s istio-proxy container will log the following error:

Anthos service mesh on the latest Istio multi-cluster installation doc appears to have solve this problem. However, the current GKE’s Istio feature is still using Istio 1.4.10 at the time of this writing. To use Istio 1.8 with multi-cluster support, continue reading through the following section.

Install Anthos service mesh without Anthos entitlement on 2 GKE clusters

The latest Anthos service mesh 1.8 does not require Anthos entitlement to install on GKE clusters. It supports multi-cluster service mesh across projects on private IP clusters. The example below is using 2 public clusters in different zones to demonstrate a multi-cluster setup.

  • Create 2 GKE clusters in different zones
  • Download the Istio release binary to create SSL certificates. You’d need CA certificate, CA key, root certificate, certificate chain files.
  • [Preview (Mesh CA with environ)] skip this section if you don’t care about using Mesh CA: 2 months after publishing the article, I tested ASM 1.9 with private clusters of public master IPs, Mesh CA with environ or fleet in preview, ingress to all internal IP ranges, implied egress to all; the installation succeeded. I had to struggle as not having connectivity for the Istiod pods caused the cross cluster load balancing to fail. Errors are in deployment pod’s logs at severity. Create Cloud NAT’s in the cluster’s regions. Don’t test (200 when private Google access enabled in the subnet) but execute from any pod in the cluster to simulate required egress from the cluster’s Istiod pod to the other cluster’s Master IP. Verify the implied egress all firewall rule is effective. Create a firewall ingress rule to allow all possible internal IP ranges (10.0.0.0/8,172.16.0.0/12,192.168.0.0/16) to all instances in the network.
  • Repeat the installation steps on another GKE cluster in the same project with the same CA certificates. Make sure you put the certs folder in Google cloud storage later and secure it.
  • Execute the curl commands on the istio-ingressgateway’s IP to test. Investigate and fix any errors executing commands. Do not execute the next command until the current command succeeds. Observe v1 and v2 pods in different clusters from the HTTP response.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store