Prometheus Metrics for On-prem Users
BuildBuddy exposes Prometheus metrics that allow monitoring the four golden signals: latency, traffic, errors, and saturation.
Prometheus metrics are exposed under the path metrics/ on port 9090 by default.
To view these metrics in a live-updating dashboard, we recommend using a tool like Grafana.
Invocation build event metrics
All invocation metrics are recorded at the end of each invocation.
buildbuddy_invocation_count (Counter)
The total number of invocations whose logs were uploaded to BuildBuddy.
Labels
- invocation_status: Invocation status:
success,failure,disconnected, orunknown. - bazel_exit_code: Exit code of a completed bazel command
- bazel_command: Command provided to the Bazel daemon:
run,test,build,coverage,mobile-install, ...
Examples
# Number of invocations per second by invocation status
sum by (invocation_status) (rate(buildbuddy_invocation_count[5m]))
# Invocation success rate
sum(rate(buildbuddy_invocation_count{invocation_status="success"}[5m]))
/
sum(rate(buildbuddy_invocation_count[5m]))
buildbuddy_invocation_duration_usec (Histogram)
The total duration of each invocation, in microseconds.
Labels
- invocation_status: Invocation status:
success,failure,disconnected, orunknown. - bazel_command: Command provided to the Bazel daemon:
run,test,build,coverage,mobile-install, ...
Examples
# Median invocation duration in the past 5 minutes
histogram_quantile(
0.5,
sum(rate(buildbuddy_invocation_duration_usec_bucket[5m])) by (le)
)
buildbuddy_invocation_open_streams (Gauge)
Number of build event streams currently being handled by the server.
buildbuddy_invocation_build_event_count (Counter)
Number of build events uploaded to BuildBuddy.
Labels
- status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
Examples
# Build events uploaded per second
sum(rate(buildbuddy_invocation_build_event_count[5m]))
# Approximate error rate of build event upload handler
sum(rate(buildbuddy_invocation_build_event_count{status="0"}[5m]))
/
sum(rate(buildbuddy_invocation_build_event_count[5m]))
buildbuddy_invocation_stats_recorder_workers (Gauge)
Number of invocation stats recorder workers currently running.
buildbuddy_invocation_stats_recorder_duration_usec (Histogram)
How long it took to finalize an invocation's stats, in microseconds.
This includes the time required to wait for all BuildBuddy apps to flush their local metrics to Redis (if applicable) and then record the metrics to the DB.
buildbuddy_invocation_webhook_invocation_lookup_workers (Gauge)
Number of webhook invocation lookup workers currently running.
buildbuddy_invocation_webhook_invocation_lookup_duration_usec (Histogram)
How long it took to lookup an invocation before posting to the webhook, in microseconds.
buildbuddy_invocation_webhook_notify_workers (Gauge)
Number of webhook notify workers currently running.
buildbuddy_invocation_webhook_notify_duration_usec (Histogram)
How long it took to post an invocation proto to the webhook, in microseconds.
Remote cache metrics
NOTE: Cache metrics are recorded at the end of each invocation, which means that these metrics provide approximate real-time signals.
buildbuddy_remote_cache_events (Counter)
Number of cache events handled.
Labels
- cache_type: Cache type:
actionfor action cache,casfor content-addressable storage. - cache_event_type: Cache event type:
hit,miss, orupload. - group_id: Group (organization) ID associated with the request.
- tracked: Whether or not billable usage was recorded for this request ("true", "false")
buildbuddy_remote_cache_download_size_bytes (Histogram)
Number of bytes downloaded from the remote cache in each download.
Use the _sum suffix to get the total downloaded bytes and the _count suffix to get the number of downloaded files.
Labels
- cache_type: Cache type:
actionfor action cache,casfor content-addressable storage. - server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"
- tracked: Whether or not billable usage was recorded for this request ("true", "false")
Examples
# Cache download rate (bytes per second)
sum(rate(buildbuddy_cache_download_size_bytes_sum[5m]))
buildbuddy_remote_cache_download_duration_usec (Histogram)
Download duration for each file downloaded from the remote cache, in microseconds.
Labels
- cache_type: Cache type:
actionfor action cache,casfor content-addressable storage. - tracked: Whether or not billable usage was recorded for this request ("true", "false")
Examples
# Median download duration for content-addressable store (CAS)
histogram_quantile(
0.5,
sum(rate(buildbuddy_remote_cache_download_duration_usec{cache_type="cas"}[5m])) by (le)
)
buildbuddy_remote_cache_upload_size_bytes (Histogram)
Number of bytes uploaded to the remote cache in each upload.
Use the _sum suffix to get the total uploaded bytes and the _count suffix to get the number of uploaded files.
Labels
- cache_type: Cache type:
actionfor action cache,casfor content-addressable storage. - server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"
- tracked: Whether or not billable usage was recorded for this request ("true", "false")