Workload identity is the Google recommended IAM authentication on GKE

Hil Liao
3 min readMar 26, 2020

--

Many of my large enterprise clients in the cloud architecture and DevOps space have asked me a common question about how to best configure Google recommended IAM authentication on configuring GCP service accounts to manage different database services such as Spanner or BigQuery.

GKE authentication done right

The old way

Store the secrets of the service account JSON file in the Kubernetes secrets.

$ kubectl create secret generic pubsub-key --from-file=key.json=PATH-TO-KEY-FILE.json
----------- pod spec -----------
spec:
volumes:
- name: google-cloud-key
secret:
secretName: pubsub-key
containers:
- name: subscriber
image: gcr.io/google-samples/pubsub-sample:v1
volumeMounts:
- name: google-cloud-key
mountPath: /var/secrets/google
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/key.json

the approach above was the recommended solution but has been replaced by

The new way is GKE workload identity.

Enabling workload identity on an existing cluster may take hours if the node pools have more than 10 nodes. Plan accordingly. The downside of using the new way is that blindly granting the workload identity on the cluster’s default namespace and default service account [default/default] would grant all pods running on [default/default] to have the Google service account credential. Below are a few steps you may take and test if following the steps in GKE workload identity aren’t working. Suppose you are using the default namespace and ksa service account.

  • Verify the pods are using the bound Google service account (GSA)
    One common misstep is failure to execute the kubectl annotate command on the Kubernetes service account (KSA)to refer to the GSA.
  • Create a pod of gcloud container image to test
apiVersion: v1
kind: Pod
metadata:
name: gcloud
labels:
app: gcloud-cmd
spec:
serviceAccountName: ksa
containers:
- name: gcloud
image: gcr.io/google.com/cloudsdktool/cloud-sdk:latest
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
  • start an interactive shell in the test pod
$ kubectl exec -it gcloud -- /bin/bash
# gcloud auth list # expect to see the bound GSA, not $PROJECT_ID.svc.id.goog
  • If gcloud auth list returns $PROJECT_ID.svc.id.goog, check the the kubectl annotate command step.
  • If gcloud compute zones list returns error or forbidden, check if IAM credentials API has been enabled and has traffic. Add --log-http to the command to debug further.
  • Verify the GSA has the correct workload identity policy bindings: gcloud iam service-accounts get-iam-policy ; verify the KSA has the annotation that refers to the GSA: kubectl describe sa ksa
  • Verify the GKE metadata server is hijacking calls to the compute engine metadata server: kubectl get DaemonSets/gke-metadata-server --namespace kube-system ; if you see no pods running or not found, it’s likely that the workload identity has not been enabled on the node pool or not enabled in the cluster at all. Refer to the migration guide.
  • Refer to further troubleshooting steps. Sometimes the GKE node pools don’t have the right GKE metadata server configured. Execute the following command in the pod to check if pods can connect to GKE metadata server.
curl -v -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token{"access_token":"ya29.d.__REDACTED__BQZL8gA","expires_in":3599,"token_type":"Bearer"}

Clean up

Upon destroying the GKE cluster, Workload identity policy bindings are not automatically deleted. Use the following command to remove the role bindings:

gcloud iam service-accounts remove-iam-policy-binding gsa_name@gsa_project.iam.gserviceaccount.com --role=roles/iam.workloadIdentityUser --member "serviceAccount:cluster_project.svc.id.goog[gke-namespace/ksa]"

--

--