Google Cloud Trace, Debug, Error Reporting

Hil Liao
4 min readJul 14, 2020

--

Recently, I have the chance to build 2 Cloud Run services that use Google Cloud Trace, Debug, Error reporting, and logging. Logging comes easy in Cloud Run without putting google-cloud-logging in requirements.txt by writing to stdout.

Cloud Trace

Setting up Cloud Trace for Python or Java isn’t hard but not knowing the tricks may cause you to stumble. I recommend using a dedicated service account for development and grant Cloud Trace Agent role to that service account. It will help you with trying to publish trace data from local to a Google cloud project that has Trace API enabled. If the cloud trace API isn’t enabled, running the python or Java application will throw errors with a link to enable the Trace API in the project. The same goes for Cloud Debug API.

Generate a service account key for local debugging, set GOOGLE_APPLICATION_CREDENTIALS to the downloaded JSON key file’s absolute path. Bind the following roles in the project to the Google service account:

- roles/cloudtrace.agent
- roles/logging.logWriter
- roles/errorreporting.writer
- roles/clouddebugger.agent

Below is a trace example of a nested Trace span where get_projects(…), get_metrics() are inner spans to monitoring_gke_node_cpu:get_gke_cpu. If there are many other traces, use the prefix monitoring to filter by the SpanName.

I had to put the gcp_tracer variable in the global variable scope. Putting it in the main method did not log any trace. Sometimes Trace would show up as quickly as 10 seconds debugging locally at as a single Trace logged. When I hit the cloud run service constantly for 30 seconds, the trace showed up after 8 minutes with the auto-refresh toggled on. I suppose Cloud trace was smart enough to batch multiple traces and published later.

Cloud Debug

The Python documentation for cloud debug only catches ImportError which I never encounter. I recommend catches all exception when Cloud Debug can’t be enabled and print errors. I was able to catch and see the error when I first execute the code and realized I did not enable Cloud Debug API. The error from the Pycharm debug console showed me a link to enable Cloud Debug API in the project.

try:
import googleclouddebugger
googleclouddebugger.enable(
breakpoint_enable_canary=True
)
//except ImportError:
// pass
except:
for e in sys.exc_info():
print(e)

Error reporting

My understanding of error reporting in Cloud Run is that uncaught errors will be logged; In addition to that, Setting Up Error Reporting for Python will allow your code to log additional errors. Unfortunately, the method to report errors is quite limited in the way that it reports the last thrown error in the call stack. I found the following sample code that worked in 2018 to pass 2 parameters to report_exception() method; regardless, the method would log the latest thrown exception or error, without being able to specify which specific exception or error. So, it’s best used right after the except clause.

// options in method report_exception()except client.HttpAccessTokenRefreshError as err:
http_context.responseStatusCode = httplib.UNAUTHORIZED # 401
user_token_err = '{} has invalid refresh token'.format(username)
error_reporting_client.report_exception(
http_context=http_context, user='Google API HttpError for user {}'.format(username))

Example of error reporting on uncaught KeyError.

Java 11 Application for Trace and Debug

Somehow I was quite bad at using Maven to build and package a Java 11 spring boot REST application. I always got the error of Application manifest missing at docker run local testing. Luckily, using Gradle yields a much better solution with simpler yaml like syntax: The Cloud Run service to create a AI platform notebook. Surprisingly, there is no Java code that enables Cloud Debug but the Dockerfile needs to get a cdbg Java agent as a .so file to run alongside the main .jar file.

Setting up the Cloud trace has a trick to set the Google cloud project ID when the code is running locally. I used LOCAL_DEBUG_GCP_PROJECT as the environment variable in Intellij alongside GOOGLE_APPLICATION_CREDENTIALS set to a Google service account granted the role of Cloud Debug agent, Cloud Trace in the project. Be cautious to pass alwaysSample() method if you want every trace to be logged. By default, only 1 out of 1000 requests is logged. Somehow, I did not have to pass alwaysSample() in Python code.

try (Scope ss = tracer.spanBuilder("sleep").setSampler(Samplers.alwaysSample()).startScopedSpan()) {

Open issues for Profiler with Java 11 application in Anthos service mesh 1.6.x

I tried to use cloud Profiler with a Java 11 application in a Anthos GKE cluster and found a problem of outgoing HTTP request that created a profile using GKE workload identity. The istio envoy proxy in the pod intercepted the outgoing request and returned 426 as HTTP 1.0 was used in httpcli. Google support and I found the root cause and created this issue. You will see the Java profiler agent gets stuck at creating a profile and you don’t see any profile collected.

--

--