Passing Google Cloud professional devops exam

Hil Liao
8 min readDec 16, 2023

It’s the 3rd time I passed the devops exam. The exam is new this year. I believe it has the same difficulty level as before. Total 50 questions in 2 hours.

  1. You need to ensure only authorized container images are deployed to GKE clusters. The security team wants to shift DevSecOps left. What’s the Google recommended method? Configure Binary Authorization, not IAM on artifact registry or least privileges on GKE clusters.
  2. You noticed the performance of the application in Google Cloud has degraded. Some requests in downstream dependencies took longer to complete. What can you do with the application? I was debating between Cloud profiler and cloud trace. I know the other 2 options were wrong. I’m not sure if the downstream dependency was a microservice or a library.
  3. Your HTTP application is deployed in europe-west2 (London,UK). Your users are only in UK. What’s the most cost effective option for network design? Global vs regional Http load balancer? Premium vs standard tier network? Choose regional Http load balancer and standard tier network.
  4. You want to monitor CPU utilization of a microservice deployed to Cloud Run, what’s the approach? Serivce health monitoring.
  5. Your cloud run service depends on an external API’s key to call the 3rd party API. What’s the Google recommended method to store the secret? Cloud KMS to encrypt the key, Store the plain text key secret in the environment variable? No, Store the key in secret manager and reference the secret name in Cloud run environment variables. There is an option to reference the secret in secret manager using a path for the cloud run service to access. I did not know that option existed. I hope that wasn’t the right answer because I did not choose that.
  6. You want to configure blue green deployment. You are shown a blue deployment of label: version:blue , a green deployment of label: version:green , a service that references version:green . The green deployment has the newer version of code. When you deploy the newer version to production, you notice users are complaining about errors. You want to roll back and allow the development team to debug. What’s the solution? Update or delete deployment’s container image? Scale the deployment to 0? No, Change the service to reference version:blue .
  7. You have 2 GKE standard clusters: A,B. You notice network related configurations were causing network access from A’s node to B’s node to be denied. You don’t have execute access on the nodes or any pods. What’s the solution? Start pods to execute traceroute? Enabling VPC flow logs? No, Run a network connectivity test in Network intelligence center from A’s node to B’s node.
  8. You have a legacy application deployed to a compute engine instance. You noticed that when the application crashed, you had to delete and recreate the instance. How do you automate the toil? Create a managed instance group with health check for auto-healing and set the size to 1. Be sure to study the difference between auto-healing and auto-scaling MIG.
  9. Be aware of the type of artifacts that can be stored in artifact registry. The question asked about storing public and private Helm charts. The security team started to disallow accessing public helm charts and to enforce VPC service control. The answer needed to have a native way of access helm charts. Store Helm charts in OCI format in artifact registry over github enterprise or Google cloud storage for storing Helm charts.
  10. You need to design infrastructure resources for different team’s use of production and development environments to host containerized workloads. Different teams can’t access each other’s infrastructure resources. You want to optimize cost between prod and dev environments. What’s the Google recommended method? Create different projects for different teams. Each team has their own project. Create a GKE cluster and prod,dev namespaces in the cluster in each project. Other options like creating a dev, a prod project and putting teams in GKE cluster’s namespaces with GKE RBAC were wrong. Creating a dev, a prod GKE clusters was also wrong because that was more expensive than a single cluster.
  11. Your git repository is the source of truth. You want to apply network policies and Daemonset for node monitoring in GKE clusters in dev,test,prod. What’s the Google recommended method? Policy controller or cloud deploy to push Network policies or Daemonset to the clusters? No, Configure Config management config sync.
  12. How do you right size a monolithic application for cost optimizing CPU usage? CPU usage does not require Ops agent which gives memory utilization metrics. Use the right sizing VM machine type recommendation. Don’t choose the options of ops agent.
  13. How can you design a microservice’s canary deployment in Anthos service mesh for Android clients to call Android specific microservices in GKE? Create a virtual service that matches the user agent like this to route to client specific deployments.
  14. The application you are monitoring uses memory cache. When there’s a cache miss, a log entry is generated. How can you visualize the cache miss in a chart with the least code change? Create log based metrics and use the monitoring dashboard to visualize the metrics. Don’t choose to modify the application code to generate metrics.
  15. You have a dev and a prod folder containing different projects. You need to configure logs for projects in dev and prod folders to be in 2 different BigQuery Datasets. You want projects created in the folders to be future proof. Create aggregated log sinks at the folder level. 1 for dev and 1 for prod. Don’t create individual log sinks in projects or aggregated log sinks at the organization level with project ID filters.
  16. You want to create monitoring dashboards for the existing GCP organization structure: The folder of dev,test,prod contains 2 projects each. You want the monitoring scoping project to not have other environment’s metrics. What’s the Google recommended method? Create a new scoping project for each folder. Don’t use existing projects as the scoping project or create a new scoping projects for all 6 projects.
  17. You have a SLI at 99.9% that’s generating $1,000,000. You are estimating $2000 for increasing the SLI to 99.99%. How do you determine if it’s worth it? 99.99%-99.9% => 0.09%; 0.09% * 1000000 => 900. Choose the option that it will genearte $900 in revenue < $2000, not worthy. Don’t choose $1000, or $2000 options.
  18. I was not sure of the right answer to the following question: A online shopping application deployed to GKE is accepting orders and publishing to a pub/sub topic about orders placed. The inventory application is processing the pub/sub messsages to reduce the stock at the warehouse. During a sales event, the application’s deployment showed increased CPU utilization from 20% to 30%. pod’s memory utilization at 10% unchanged. The pub/sub subscription showed undelivered or unacknowledged message count from 100 to 8450, oldest unacknowledged message age from 130 ms to 8544 ms. What’s the solution? Options: [increase the subscription deadline, increase the deployment’s replica, put a vertical queue in the ordering service to limit order acceptance, increase the pod’s CPU limit]
  19. I almost got the following question wrong. You have enabled data access audit logs. You need to configure a IAM role for the security team to review the logs. What’s the IAM role to bind? Not roles/logging.logViewer or grant individual users. Bind roles/logging.privateLogViewer to the Google Group of the security team. Quote: The Logs Viewer role (roles/logging.viewer) gives you read-only access to Admin Activity, Policy Denied, and System Event audit logs. If you have just this role, you cannot view Data Access audit logs that are in the _Default bucket.
  20. What’s the method to share a monitoring custom dashboard to a partner team? Share the URL to the custom dashboard over exporting MQL or Json files to the partner team.
  21. You review the error budget of a microservice. In the past 6 months, it has not used 5% of the error budget. You reviewed the SLO with the business stakeholders and it was correct. You want to reduce technical debts and devlier new features quicker. It’s end of the month and you have 97% of the error budget unused. What can you do? Assign engineers to work on the backlog and deploy to production. Don’t schedule downtime to screw with customers or add replicas to the instances or pods.
  22. You work in a highly regulated industry. All the logs must be stored for 7 years without human error or deletion. What’s the solution? Configure a logs sink for the projects and pick storage buckets as the sink. Configure 7 years of retention policy and lock it.
  23. You were working production support. Suddenly, you were overwhelmed with alerts. you found out the alerts were due to restarting a failed service within a minute. How can you reduce burn out on personnel? Configure the alerting policy to have proper burn rate. Don’t distribute the alerts to support in different regions or investigate each alert.
  24. Your application needs to process large amount of data. The performance of data processing depends on the number of CPU on the VM. The data is from external sources. You want to design a cloud native method to allow external sources to provide data and start processing in a cost effective way. What’s the solution? Choose cloud storage over sFTP. Configure the object finalized event to trigger a cloud function. Create an auto-scaling managed instance group to start processing the data. The group’s image has prebuilt software for data processing. The software terminates the instances at the end of data processing. I hope that was the right answer.
  25. The question asked about the best practices of using Cloud build, git hooks, acceptance, and integration testing for Devops best practices. I can’t remember the exact question but the right answer should be using git hooks to run unit tests instead of running unit tests in a cloud build step on the built container image. if unit tests pass, run acceptance and integration tests in stage, not production. if those tests pass, deploy to production. Don’t run such tests in production. Don’t proceed if any tests fail.
  26. There are at least 3 questions about Terraform. I got the following wrong. You are creating a new instance template for a running managed instance group in Terraform. When the CD pipeline executed, you observed an error showing the instance template couldn’t be deleted because a managed instance group was using it. The option should be to update the instance group to use the new template. I choose wrong: delete the instance group, create the new template, create a new instance group.
  27. You are implementing a Terraform Gitops style deployment strategy for IaC. The CD pipelines need to execute on a compute engine instance. What’s the Google recommended method to grant the instance permissions to create infrastructure resources? Create a service account. Assign the IAM roles for IaC. Assign the service account at instance creation. Don’t choose generating service account keys or assigning the key to the environment variables.
  28. You need to generate a service account key for a 3rd party application to act as the service account. When you tried to generate the JSON key, you received an error stating iam.disableServiceAccountKeyCreation organizational policy was preventing key generation. You identified the policy constraint was enforced at the organization level. What’s the solution? Keep the constraint at the organization level and customize the enforcement in the project where you generate the key. Don’t just change the constraint at the folder or at the organization level to allow key generation.

Useful resources

--

--