Google Cloud Professional Machine Learning Engineer Certification Preparation Guide

Hil Liao
4 min readDec 9, 2020

--

I started working on AI, ML related tasks in my current project some time in May. Before that, I had limited exposure to Machine learning. Even my tasks in the project are more ML ops related than anything data science or model developer related. Sure, I am in the infrastructure team interfacing with data scientists and ML engineers. But, I did not understand how to write python code to build any sophisticated models in Tensorflow 2.x.

I found 2 great study guides. 1 by Dmitri Lerko which has the same title as my article. another is 20 Days to Google Cloud Professional Machine Learning Engineer Exam (BETA). The problem is both guides are based on the beta exam. I took the general exam and barely passed. I knew I barely passed not because I underprepared. I felt lucky that many of the topics from the study guides appeared in the exam. I barely passed because I was stupid enough to assume there would be 50 questions like all the other exams I took. I remember all general exams had 50 questions. It was a big surprise and caught me off guard with 60 questions. I was stupid not to check the total number of questions when I started the exam so I had to fast forward on the last 8 questions and had only 9 seconds to random select on the last 3 questions.

  1. I’d agree that most questions are at a higher architectural level. Avoid spending too much time reading through the python code.
  2. Some python Keras model development experience is required for a few questions. Study the format of passing JSON to AI platform prediction as {“instances”: [{example1}, {example2}]}.
  3. If you have data engineering or CI, CD cloud development background, there are quite a few questions about how to build good ML pipelines with cloud native GCP components such as cloud build, cloud source repository, BigQuery.
  4. Understand what a serverless solution design to process data means. What would be valid serverless options? BigQuery, dataflow are definitely better choices than Dataproc or Kubeflow.
  5. Learn when to use Cloud data fusion, dataprep in different scenarios.
  6. Learn which analytics data warehouse is better to interface with a third party analytics and visualization tool which requires ANSI:2011 SQL.
  7. There are 1 or 2 questions on edge TPU training. Understand which of the following model type to use: low latency (mobile-low-latency-1), general purpose usage (mobile-versatile-1) ,higher prediction quality (mobile-high-accuracy-1).
  8. Understand accuracy most likely yields lower speed in training or prediction.
  9. Learn some basic gcloud commands to submit AI platform training jobs, especially the --scale-tier parameter.
  10. How and where to choose machine types for online prediction: during model version creation.
  11. I believe Mirrored or TPU strategy in Tensorflow Distributed training should definitely appear in each exam. That was actually quite useful in ML ops performance troubleshooting. Learn when to scale up the machine type in training or prediction vs scale out: in most cases to avoid out of memory exceptions.
  12. I did not see any scenario based questions like those in the cloud architect exam. Like all other exams, don’t get stuck on 1 question which was the mistake I made and run out of time. 60 questions are 10 more than the typical exam. Be fully aware of time check at 30 minute intervals. If you see you are over the 20th question on the 1st 30 minute, you are OK. If you are on your 17th question, you really need to speed up.
  13. Understand how to use AI platform prediction continuous evaluation to monitor the model performance
  14. learn how to fix model degradation over time in production. i,e, detect and mitigate model drift with Google recommended practices.
  15. study which attribution methods in AI explanations to use for non-differentiable models such as ensembles of trees vs differential models, models with large feature spaces vs image data.
  16. study how to use Recommendations AI to maximize retailer’s revenue: recording user events, import catalog; study which recommendation type optimizes click through rate vs revenue per order.
  17. Use Kubeflow runs in an experiment to compare ROC AUC in different models to pick the optimal model
  18. When and how to submit distributed training job with custom containers: most likely when there are critical dependencies in the ML framework that you can’t replace with Tensorflow.
  19. Learn the most efficient way to execute BigQuery queries or copy file objects in cloud storage in Kubeflow components is to use the existing Open source Kubeflow Google cloud pipeline components, definitely not writing Python code to use BigQuery client library or build a container with @kfp.dsl.component def my_component(my_param): return kfp.dsl.ContainerOp(name=’component name’, image=’gcr.io/path/to/container/image’)
  20. learn how to improve model bottleneck by using TFRecords to improve input vs the raw python input function to read csv files: use interleave in TFRecordDataset to process many input files concurrently, use prefetch to improves latency and throughput, at the cost of using additional memory to store prefetched elements.
  21. learn how to submit a AI platform hyperparameter tuning job
  22. when to use L1, L2 regularization to prevent overfitting
  23. learn the technique to give more images of defective products to improve model’s accuracy in a situation where the model can’t effectively detect defective products based on too few bad examples (bad vs good ratio at 1:100).

If there were only 50 questions, I would think the exam wasn’t that hard. 60 questions really made it hard and I provided that feedback at the end of the exam. Personally, I found the machine learning exam as hard as the network engineer exam but for different reasons. Network engineer exam was hard for me because I lacked the enterprise network solution design background. But the machine learning exam’s difficulty was on the time crunch. Here’s my certification.

--

--

Responses (1)