Skip to main content

Prometheus Metrics for On-prem Users

BuildBuddy exposes Prometheus metrics that allow monitoring the four golden signals: latency, traffic, errors, and saturation.

Prometheus metrics are exposed under the path metrics/ on port 9090 by default.

To view these metrics in a live-updating dashboard, we recommend using a tool like Grafana.

Invocation build event metrics

All invocation metrics are recorded at the end of each invocation.

buildbuddy_invocation_count (Counter)

The total number of invocations whose logs were uploaded to BuildBuddy.

Labels

  • invocation_status: Invocation status: success, failure, disconnected, or unknown.
  • bazel_exit_code: Exit code of a completed bazel command
  • bazel_command: Command provided to the Bazel daemon: run, test, build, coverage, mobile-install, ...

Examples

# Number of invocations per second by invocation status
sum by (invocation_status) (rate(buildbuddy_invocation_count[5m]))

# Invocation success rate
sum(rate(buildbuddy_invocation_count{invocation_status="success"}[5m]))
/
sum(rate(buildbuddy_invocation_count[5m]))

buildbuddy_invocation_duration_usec (Histogram)

The total duration of each invocation, in microseconds.

Labels

  • invocation_status: Invocation status: success, failure, disconnected, or unknown.
  • bazel_command: Command provided to the Bazel daemon: run, test, build, coverage, mobile-install, ...

Examples

# Median invocation duration in the past 5 minutes
histogram_quantile(
0.5,
sum(rate(buildbuddy_invocation_duration_usec_bucket[5m])) by (le)
)

buildbuddy_invocation_build_event_count (Counter)

Number of build events uploaded to BuildBuddy.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.

Examples

# Build events uploaded per second
sum(rate(buildbuddy_invocation_build_event_count[5m]))

# Approximate error rate of build event upload handler
sum(rate(buildbuddy_invocation_build_event_count{status="0"}[5m]))
/
sum(rate(buildbuddy_invocation_build_event_count[5m]))

buildbuddy_invocation_stats_recorder_workers (Gauge)

Number of invocation stats recorder workers currently running.

buildbuddy_invocation_stats_recorder_duration_usec (Histogram)

How long it took to finalize an invocation's stats, in microseconds.

This includes the time required to wait for all BuildBuddy apps to flush their local metrics to Redis (if applicable) and then record the metrics to the DB.

buildbuddy_invocation_webhook_invocation_lookup_workers (Gauge)

Number of webhook invocation lookup workers currently running.

buildbuddy_invocation_webhook_invocation_lookup_duration_usec (Histogram)

How long it took to lookup an invocation before posting to the webhook, in microseconds.

buildbuddy_invocation_webhook_notify_workers (Gauge)

Number of webhook notify workers currently running.

buildbuddy_invocation_webhook_notify_duration_usec (Histogram)

How long it took to post an invocation proto to the webhook, in microseconds.

Remote cache metrics

NOTE: Cache metrics are recorded at the end of each invocation, which means that these metrics provide approximate real-time signals.

buildbuddy_remote_cache_events (Counter)

Number of cache events handled.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • cache_event_type: Cache event type: hit, miss, or upload.

buildbuddy_remote_cache_download_size_bytes (Histogram)

Number of bytes downloaded from the remote cache in each download.

Use the _sum suffix to get the total downloaded bytes and the _count suffix to get the number of downloaded files.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"

Examples

# Cache download rate (bytes per second)
sum(rate(buildbuddy_cache_download_size_bytes_sum[5m]))

buildbuddy_remote_cache_download_duration_usec (Histogram)

Download duration for each file downloaded from the remote cache, in microseconds.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.

Examples

# Median download duration for content-addressable store (CAS)
histogram_quantile(
0.5,
sum(rate(buildbuddy_remote_cache_download_duration_usec{cache_type="cas"}[5m])) by (le)
)

buildbuddy_remote_cache_upload_size_bytes (Histogram)

Number of bytes uploaded to the remote cache in each upload.

Use the _sum suffix to get the total uploaded bytes and the _count suffix to get the number of uploaded files.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"

Examples

# Cache upload rate (bytes per second)
sum(rate(buildbuddy_cache_upload_size_bytes_sum[5m]))

buildbuddy_remote_cache_upload_duration_usec (Histogram)

Upload duration for each file uploaded to the remote cache, in microseconds.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.

Examples

# Median upload duration for content-addressable store (CAS)
histogram_quantile(
0.5,
sum(rate(buildbuddy_remote_cache_upload_duration_usec{cache_type="cas"}[5m])) by (le)
)

buildbuddy_remote_cache_disk_cache_last_eviction_age_usec (Gauge)

The age of the item most recently evicted from the cache, in microseconds.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_eviction_age_msec (Histogram)

Age of items evicted from the cache, in milliseconds.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_num_evictions (Counter)

Number of items evicted.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_partition_size_bytes_evicted (Counter)

Number of bytes in the partition evicted.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_partition_size_bytes (Gauge)

Number of bytes in the partition.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_partition_capacity_bytes (Gauge)

Number of bytes in the partition.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_partition_num_items (Gauge)

Number of items in the partition.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
  • cache_type: Cache type: action for action cache, cas for content-addressable storage.

buildbuddy_remote_cache_disk_cache_duplicate_writes (Counter)

Number of writes for digests that already exist.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_added_file_size_bytes (Histogram)

Size of artifacts added to the file cache, in bytes.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_filesystem_total_bytes (Gauge)

Total size of the underlying filesystem.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_disk_cache_filesystem_avail_bytes (Gauge)

Available bytes in the underlying filesystem.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

Examples

# Total number of duplicate writes.
sum(buildbuddy_remote_cache_duplicate_writes)

buildbuddy_remote_cache_disk_cache_duplicate_writes_bytes (Counter)

Number of bytes written that already existed in the cache.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_distributed_cache_peer_lookups (Histogram)

Number of peers consulted (including the 'local peer') for a distributed cache read before returning a response.

For batch requests, one observation is recorded for each digest in the request.

Labels

  • op: Distributed cache operation name, such as "FindMissing" or "Get".
  • status: Cache lookup result - "hit," "miss," or "partial" (for batched, proxied RPCs where part of the response is served out of the local cache).

buildbuddy_remote_cache_migration_not_found_error_count (Counter)

Number of not found errors from the destination cache during a cache migration.

Labels

  • type: Describes the type of cache request

buildbuddy_remote_cache_migration_double_read_hit_count (Counter)

Number of double reads where the source and destination caches hold the same digests during a cache migration.

Labels

  • type: Describes the type of cache request

buildbuddy_remote_cache_migration_copy_chan_size (Gauge)

Number of digests queued to be copied during a cache migration.

buildbuddy_remote_cache_migration_bytes_copied (Counter)

Number of bytes copied from the source to destination cache during a cache migration.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.

buildbuddy_remote_cache_migration_blobs_copied (Counter)

Number of blobs copied from the source to destination cache during a cache migration.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.

buildbuddy_remote_cache_tree_cache_lookup_count (Counter)

Total number of TreeCache lookups.

Labels

  • status: The TreeCache status: hit/miss/invalid_entry.
  • level: TreeCache directory depth: 0 for the root dir, 1 for a direct child of the root dir, and so on.

buildbuddy_remote_cache_tree_cache_split_lookup_count (Counter)

Total number of TreeCache split lookups.

Labels

  • status: The TreeCache split lookup status: hit/miss/failure

buildbuddy_remote_cache_tree_cache_split_write_count (Counter)

Total number of splits written to TreeCache.

buildbuddy_remote_cache_tree_cache_set_count (Counter)

Total number of TreeCache sets.

Labels

  • status: The TreeCache set status: success/deadline_exceeded/other_error

buildbuddy_remote_cache_tree_cache_bytes_transferred (Counter)

Number of bytes written or read from tree cache

Labels

  • op: TreeCache operation "read" or "write"

buildbuddy_remote_cache_lookaside_cache_lookup_count (Counter)

Total number of Lookaside Cache lookups.

Labels

  • status: The Lookaside cache status: hit/miss.

buildbuddy_remote_cache_lookaside_cache_eviction_age_msec (Histogram)

Age of items evicted from the cache, in milliseconds.

Labels

  • eviction_reason: The reason an item was evicted from the lookaside cache. One of: "expired" or "size"

Remote execution metrics

buildbuddy_remote_execution_count (Counter)

Number of actions executed remotely.

This only includes actions which reached the execution phase. If an action fails before execution (for example, if it fails authentication) then this metric is not incremented.

Labels

  • exit_code: Process exit code of an executed action.
  • status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".
  • isolation: Effective workload isolation type used for an executed task, such as "docker", "podman", "firecracker", or "none".

Examples

# Total number of actions executed per second
sum(rate(buildbuddy_remote_execution_count[5m]))

buildbuddy_remote_execution_tasks_started_count (Counter)

Number of tasks started remotely, but not necessarily completed.

Includes retry attempts of the same task.

buildbuddy_remote_execution_executed_action_metadata_durations_usec (Histogram)

Time spent in each stage of action execution, in microseconds.

Queries should filter or group by the stage label, taking care not to aggregate different stages.

Labels

  • stage: Executed action stage. Action execution is split into stages corresponding to the timestamps defined in ExecutedActionMetadata: queued, input_fetch, execution, and output_upload. An additional stage, worker, includes all stages during which a worker is handling the action, which is all stages except the queued stage.
  • group_id: Group (organization) ID associated with the request.

Examples

# Median duration of all command stages
histogram_quantile(
0.5,
sum(rate(buildbuddy_remote_execution_executed_action_metadata_durations_usec_bucket[5m])) by (le, stage)
)

# p90 duration of just the command execution stage
histogram_quantile(
0.9,
sum(rate(buildbuddy_remote_execution_executed_action_metadata_durations_usec_bucket{stage="execution"}[5m])) by (le)
)

buildbuddy_remote_execution_task_pressure_stall_duration_fraction (Histogram)

Linux PSI stall time as a fraction of each action's execution duration (0-1).

Labels

  • resource: System resource: "cpu", "memory", or "io".
  • stall_type: Pressure stall type: "some" (task is partially stalled on the resource) or "full" (task is completely stalled on the resource).

buildbuddy_remote_execution_task_size_read_requests (Counter)

Number of read requests to the task sizer, which estimates action resource usage based on historical execution stats.

Labels

  • status: Status of the task size read request: hit, miss, or error.
  • isolation: Effective workload isolation type used for an executed task, such as "docker", "podman", "firecracker", or "none".
  • os: OS associated with the request.
  • arch: CPU architecture associated with the request.
  • group_id: Group (organization) ID associated with the request.

buildbuddy_remote_execution_task_size_write_requests (Counter)

Number of write requests to the task sizer, which estimates action resource usage based on historical execution stats.

Labels

  • status: Status of the task size write request: ok, missing_stats or error.
  • isolation: Effective workload isolation type used for an executed task, such as "docker", "podman", "firecracker", or "none".
  • os: OS associated with the request.
  • arch: CPU architecture associated with the request.
  • group_id: Group (organization) ID associated with the request.

buildbuddy_remote_execution_task_size_prediction_duration_usec (Histogram)

Task size prediction model request duration in microseconds.

Labels

  • status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".

buildbuddy_remote_execution_enqueued_task_milli_cpu (Histogram)

Milli-CPU prediction of enqueued tasks.

buildbuddy_remote_execution_enqueued_task_memory_bytes (Histogram)

Memory prediction of enqueued tasks.

buildbuddy_remote_execution_waiting_execution_result (Gauge)

Number of execution requests for which the client is actively waiting for results.

Labels

  • group_id: Group (organization) ID associated with the request.

Examples

# Total number of execution requests with client waiting for result.
sum(buildbuddy_remote_execution_waiting_execution_result)

buildbuddy_remote_execution_requests (Counter)

Number of execution requests received.

Labels

  • group_id: Group (organization) ID associated with the request.
  • os: OS associated with the request.
  • arch: CPU architecture associated with the request.

buildbuddy_remote_execution_executor_registration_count (Counter)

Number of executor registrations on the scheduler.

Labels

  • version: Binary version. Example: v2.0.0.

Examples

# Rate of new execution requests by OS/Arch.
sum(rate(buildbuddy_remote_execution_requests[1m])) by (os, arch)

buildbuddy_remote_execution_merged_actions (Counter)

Number of identical execution requests that have been merged.

Labels

  • group_id: Group (organization) ID associated with the request.

buildbuddy_remote_execution_hedged_actions (Counter)

Number of identicial execution request which were merged for which a hedged execution was run in the background.

Labels

  • group_id: Group (organization) ID associated with the request.

buildbuddy_remote_execution_merged_actions_per_execution (Histogram)

Distribution of how many actions were submitted and merged against a single, canonical execution over the lifetime of that canonical execution.

Note that this metric is recorded once per merged-action, so distribution values are cumulative, or recorded n-times per canonical execution.

Labels

  • group_id: Group (organization) ID associated with the request.

buildbuddy_remote_execution_merged_action_submit_time_offset_usec (Histogram)

The offset, in microseconds of wall-time, between the time when a merged action was submitted to the execution server and when the original action was submitted to the execution server.

Labels

  • group_id: Group (organization) ID associated with the request.

Examples

# Rate of merged actions by group.
sum(rate(buildbuddy_remote_execution_merged_actions[1m])) by (group_id)

buildbuddy_remote_execution_queue_length (Gauge)

Number of actions currently waiting in the executor queue.

Labels

  • group_id: Group (organization) ID associated with the request.

Examples

# Median queue length across all executors
quantile(0.5, buildbuddy_remote_execution_queue_length)

buildbuddy_remote_execution_tasks_executing (Gauge)

Number of tasks currently being executed by the executor.

Labels

  • stage: Executed action stage. Action execution is split into stages corresponding to the timestamps defined in ExecutedActionMetadata: queued, input_fetch, execution, and output_upload. An additional stage, worker, includes all stages during which a worker is handling the action, which is all stages except the queued stage.

Examples

# Fraction of idle executors
count_values(0, buildbuddy_remote_execution_tasks_executing)
/
count(buildbuddy_remote_execution_tasks_executing)

buildbuddy_remote_execution_assigned_ram_bytes (Gauge)

Estimated RAM on the executor that is currently allocated for task execution, in bytes.

buildbuddy_remote_execution_assigned_and_queued_estimated_ram_bytes (Gauge)

Estimated RAM on the executor that is currently allocated for queued or executing tasks, in bytes.

Note that this is a fuzzy estimate because there's no guarantee that tasks queued on a machine will be handled by that machine.

buildbuddy_remote_execution_assignable_ram_bytes (Gauge)

Maximum total RAM that can be allocated for task execution, in bytes.

buildbuddy_remote_execution_assigned_milli_cpu (Gauge)

Estimated CPU time on the executor that is currently allocated for task execution, in milliCPU (CPU-milliseconds per second).

buildbuddy_remote_execution_assigned_and_queued_estimated_milli_cpu (Gauge)

Estimated CPU time on the executor that is currently allocated for queued or executing tasks, in milliCPU (CPU-milliseconds per second).

Note that this is a fuzzy estimate because there's no guarantee that tasks queued on a machine will be handled by that machine.

buildbuddy_remote_execution_assignable_milli_cpu (Gauge)

Maximum total CPU time on the executor that can be allocated for task execution, in milliCPU (CPU-milliseconds per second).

buildbuddy_remote_execution_cpu_utilization_milli_cpu (Gauge)

Approximate current CPU utilization of tasks executing, in milli-CPU (CPU-milliseconds per second).

This allows for much higher granularity than using a rate() on used_milli_cpu metric.

buildbuddy_remote_execution_file_download_count (Histogram)

Number of files downloaded during remote execution.

buildbuddy_remote_execution_file_download_size_bytes (Histogram)

Total number of bytes downloaded during remote execution.

buildbuddy_remote_execution_file_download_duration_usec (Histogram)

Per-file download duration during remote execution, in microseconds.

buildbuddy_remote_execution_file_upload_count (Histogram)

Number of files uploaded during remote execution.

buildbuddy_remote_execution_file_upload_size_bytes (Histogram)

Total number of bytes uploaded during remote execution.

buildbuddy_remote_execution_skipped_output_bytes (Counter)

Total number of output bytes that weren't uploaded after remote execution.

buildbuddy_remote_execution_file_upload_duration_usec (Histogram)

Per-file upload duration during remote execution, in microseconds.

buildbuddy_firecracker_stage_duration_usec (Histogram)

The total duration of each firecracker stage, in microseconds.

Labels

  • stage: Generic label to describe the stage the metric is capturing

Stage label values

  • "init": Time for the VM to start up (either a new VM or from a snapshot)
  • "exec": Time to run the command inside the container
  • "task_lifecycle": Time from when the task if first assigned to the VM (beginning of init) to after it's finished execution. This roughly represents what a customer will wait for the task to complete after it's been scheduled to a firecracker runner
  • "pause": Time to pause the VM, save a snapshot, and cleanup resources

Examples

# P95 workflow lifecycle duration in the past 5 minutes, grouped by group_id
histogram_quantile(
0.95,
sum by(le, group_id) (
rate(buildbuddy_firecracker_stage_duration_usec_bucket{job="executor-workflows", stage="task_lifecycle"}[5m])
)
)

buildbuddy_firecracker_exec_dial_duration_usec (Histogram)

Time taken to dial the VM guest execution server after it has been started or resumed, in microseconds.

buildbuddy_firecracker_snapshot_remote_cache_upload_size_bytes (Counter)

After a copy-on-write snapshot has been used, the total count of bytes dirtied.

buildbuddy_firecracker_cow_snapshot_dirty_chunk_ratio (Histogram)

After a copy-on-write snapshot has been used, the ratio of dirty/total chunks.

Labels

  • file_name: Name of a file.

Examples

# To view how many elements fall into each bucket
# Visualize with the Bar Gauge type
# Legend: {{le}}
# Format: Heatmap
sum(increase(buildbuddy_firecracker_cow_snapshot_dirty_chunk_ratio_bucket[5m])) by(le)

buildbuddy_firecracker_cow_snapshot_dirty_bytes (Counter)

After a copy-on-write snapshot has been used, the total count of bytes dirtied.

Labels

  • file_name: Name of a file.

buildbuddy_firecracker_cow_snapshot_chunk_source_ratio (Histogram)

After a copy-on-write snapshot has been used, the percentage of chunks that were initialized by the given source.

Labels

  • file_name: Name of a file.
  • chunk_source: For chunked snapshot files, describes the initialization source of the chunk (Ex. remote_cache or local_filecache)

buildbuddy_firecracker_cow_snapshot_memory_mapped_bytes (Gauge)

Total number of bytes currently memory-mapped.

Labels

  • file_name: Name of a file.

buildbuddy_firecracker_cow_snapshot_page_fault_total_duration_usec (Histogram)

For a snapshotted VM, total time spent fulfilling page faults.

Labels

  • stage: Generic label to describe the stage the metric is capturing

buildbuddy_firecracker_cow_snapshot_chunk_operation_duration_usec (Histogram)

For a COW snapshot, cumulative time spent on an operation type.

Labels

  • file_name: Name of a file.
  • name: The name used to identify the type of an unexpected event.
  • stage: Generic label to describe the stage the metric is capturing

buildbuddy_remote_execution_max_recyclable_resource_usage_event (Counter)

Counter for firecracker runners that reach max disk/memory usage and won't get recycled.

Labels

  • group_id: Group (organization) ID associated with the request.
  • name: The name used to identify the type of an unexpected event.
  • recycled_runner_status: For remote execution runners, describes the recycling status (Ex. 'clean' if the runner is not recycled or 'recycled')

buildbuddy_remote_execution_recycle_runner_requests (Counter)

Number of execution requests with runner recycling enabled (via the platform property recycle-runner=true).

Labels

  • status: Status of the recycle runner request: hit if the executor assigned a recycled runner to the action; miss otherwise.

buildbuddy_remote_execution_runner_pool_count (Gauge)

Number of command runners that are currently pooled (and available for recycling).

buildbuddy_remote_execution_runner_pool_evictions (Counter)

Number of command runners removed from the pool to make room for other runners.

buildbuddy_remote_execution_runner_pool_failed_recycle_attempts (Counter)

Number of failed attempts to add runners to the pool.

Labels

  • reason: Reason for a runner not being added to the runner pool.

buildbuddy_remote_execution_runner_pool_memory_usage_bytes (Gauge)

Total memory usage of pooled command runners, in bytes.

Currently only supported for Docker-based executors.

buildbuddy_remote_execution_runner_pool_disk_usage_bytes (Gauge)

Total disk usage of pooled command runners, in bytes.

buildbuddy_remote_execution_file_cache_requests (Counter)

Number of local executor file cache requests.

Labels

  • status: Status of the file cache request: hit if found in cache, miss otherwise.

Latency of individual file cache link operations.

buildbuddy_remote_execution_file_cache_last_eviction_age_usec (Gauge)

Age of the last entry evicted from the executor's local file cache (relative to when it was added to the cache), in microseconds.

buildbuddy_remote_execution_file_cache_added_file_size_bytes (Histogram)

Size of artifacts added to the file cache, in bytes.

Blobstore metrics

"Blobstore" refers to the backing storage that BuildBuddy uses to store objects in the cache, as well as certain pieces of temporary data (such as invocation events while an invocation is in progress).

buildbuddy_blobstore_read_count (Counter)

Number of files read from the blobstore.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • blobstore_type: gcs (Google Cloud Storage), aws_s3, or disk.

buildbuddy_blobstore_read_size_bytes (Histogram)

Number of bytes read from the blobstore per file.

Labels

  • blobstore_type: gcs (Google Cloud Storage), aws_s3, or disk.

Bytes downloaded per second

sum(rate(buildbuddy_blobstore_read_size_bytes[5m]))


### **`buildbuddy_blobstore_read_duration_usec`** (Histogram)

Duration per blobstore file read, in **microseconds**.

#### Labels

- **blobstore_type**: `gcs` (Google Cloud Storage), `aws_s3`, or `disk`.


### **`buildbuddy_blobstore_write_count`** (Counter)

Number of files written to the blobstore.

#### Labels

- **status**: Status code as defined by [grpc/codes](https://godoc.org/google.golang.org/grpc/codes#Code). This is a numeric value; any non-zero code indicates an error.
- **blobstore_type**: `gcs` (Google Cloud Storage), `aws_s3`, or `disk`.

# Bytes uploaded per second
sum(rate(buildbuddy_blobstore_write_size_bytes[5m]))

buildbuddy_blobstore_write_size_bytes (Histogram)

Number of bytes written to the blobstore per file.

Labels

  • blobstore_type: gcs (Google Cloud Storage), aws_s3, or disk.

buildbuddy_blobstore_write_duration_usec (Histogram)

Duration per blobstore file write, in microseconds.

Labels

  • blobstore_type: gcs (Google Cloud Storage), aws_s3, or disk.

buildbuddy_blobstore_delete_count (Counter)

Number of files deleted from the blobstore.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • blobstore_type: gcs (Google Cloud Storage), aws_s3, or disk.

buildbuddy_blobstore_delete_duration_usec (Histogram)

Delete duration per blobstore file deletion, in microseconds.

Labels

  • blobstore_type: gcs (Google Cloud Storage), aws_s3, or disk.

SQL metrics

The following metrics are for monitoring the SQL database configured for BuildBuddy.

If you'd like to see an up-to-date catalog of what BuildBuddy stores in its SQL database, see the table definitions here.

Query / error rate metrics

buildbuddy_sql_query_count (Counter)

Number of SQL queries executed.

Labels

  • sql_query_template: SQL query before substituting template parameters.

Examples

# SQL queries per second (by query template).
sum by (sql_query_template) (rate(buildbuddy_sql_query_count[5m]))

buildbuddy_sql_query_duration_usec (Histogram)

SQL query duration, in microseconds.

Labels

  • sql_query_template: SQL query before substituting template parameters.

Examples

# Median SQL query duration
histogram_quantile(
0.5,
sum(rate(buildbuddy_sql_query_duration_usec_bucket[5m])) by (le)
)

buildbuddy_sql_error_count (Counter)

Number of SQL queries that resulted in an error.

Labels

  • sql_query_template: SQL query before substituting template parameters.

Examples

# SQL error rate
sum(rate(buildbuddy_sql_error_count[5m]))
/
sum(rate(buildbuddy_sql_query_count[5m]))

database/sql metrics

The following metrics directly expose DBStats from the database/sql Go package.

buildbuddy_sql_max_open_connections (Gauge)

Maximum number of open connections to the database.

Labels

  • sql_db_role: SQL DB replica role: primary for read+write replicas, or read_replica for read-only DB replicas.

buildbuddy_sql_open_connections (Gauge)

The number of established connections to the database.

Labels

  • connection_status: Status of the database connection: in_use or idle
  • sql_db_role: SQL DB replica role: primary for read+write replicas, or read_replica for read-only DB replicas.

buildbuddy_sql_wait_count (Counter)

The total number of connections waited for.

Labels

  • sql_db_role: SQL DB replica role: primary for read+write replicas, or read_replica for read-only DB replicas.

buildbuddy_sql_wait_duration_usec (Counter)

The total time blocked waiting for a new connection, in microseconds.

Labels

  • sql_db_role: SQL DB replica role: primary for read+write replicas, or read_replica for read-only DB replicas.

buildbuddy_sql_max_idle_closed (Counter)

The total number of connections closed due to SetMaxIdleConns.

Labels

  • sql_db_role: SQL DB replica role: primary for read+write replicas, or read_replica for read-only DB replicas.

buildbuddy_sql_max_idle_time_closed (Counter)

The total number of connections closed due to SetConnMaxIdleTime.

Labels

  • sql_db_role: SQL DB replica role: primary for read+write replicas, or read_replica for read-only DB replicas.

buildbuddy_sql_max_lifetime_closed (Counter)

The total number of connections closed due to SetConnMaxLifetime.

Labels

  • sql_db_role: SQL DB replica role: primary for read+write replicas, or read_replica for read-only DB replicas.

HTTP metrics

buildbuddy_http_request_count (Counter)

HTTP request count.

Labels

  • route: HTTP route before substituting path parameters (/invocation/:id, /settings, ...)
  • method: HTTP method: GET, POST, ...

Examples

# Requests per second, by status code
sum by (code) (rate(buildbuddy_http_request_count[5m]))

# 5xx error ratio
sum(rate(buildbuddy_http_request_count{code=~"5.."}[5m]))
/
sum(rate(buildbuddy_http_request_count[5m]))

buildbuddy_http_request_handler_duration_usec (Histogram)

Time taken to handle each HTTP request in microseconds.

Labels

  • route: HTTP route before substituting path parameters (/invocation/:id, /settings, ...)
  • method: HTTP method: GET, POST, ...
  • code: HTTP response code: 200, 302, 401, 404, 500, ...

Examples

# Median request duration for successfuly processed (2xx) requests.
# Other status codes may be associated with early-exits and are
# likely to add too much noise.
histogram_quantile(
0.5,
sum by (le) (rate(buildbuddy_http_request_handler_duration_usec{code=~"2.."}[5m]))
)

buildbuddy_http_response_size_bytes (Histogram)

Response size of each HTTP response in bytes.

Labels

  • route: HTTP route before substituting path parameters (/invocation/:id, /settings, ...)
  • method: HTTP method: GET, POST, ...
  • code: HTTP response code: 200, 302, 401, 404, 500, ...

Examples

# Median HTTP response size
histogram_quantile(
0.5,
sum by (le) (rate(buildbuddy_http_response_size_bytes[5m]))
)

Internal metrics

These metrics are for monitoring lower-level subsystems of BuildBuddy.

Build event handler

The build event handler logs all build events uploaded to BuildBuddy as part of the Build Event Protocol.

buildbuddy_build_event_handler_duration_usec (Histogram)

The time spent handling each build event in microseconds.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.

Webhooks

Webhooks are HTTP endpoints exposed by BuildBuddy server which allow it to respond to repository events. These URLs are created as part of BuildBuddy workflows.

buildbuddy_webhook_handler_workflows_started (Counter)

The number of workflows triggered by the webhook handler.

Labels

  • event: Type of event sent to BuildBuddy's webhook handler: push or pull_request.

Cache

"Cache" refers to the cache backend(s) that BuildBuddy uses to accelerate file IO operations, which are common in different subsystems such as the remote cache and the fetch server (for downloading invocation artifacts).

BuildBuddy can be configured to use multiple layers of caching (an in-memory layer, coupled with a cloud storage layer).

get metrics

get metrics track non-streamed cache reads (all data is fetched from the cache in a single request).

buildbuddy_cache_get_count (Counter)

Number of cache get requests.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_get_duration_usec (Histogram)

The time spent retrieving each entry from the cache, in microseconds.

This is recorded only for successful gets.

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_get_size_bytes (Histogram)

Size of each entry retrieved from the cache, in bytes.

This is recorded only for successful gets.

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

read metrics

read metrics track streamed cache reads.

buildbuddy_cache_read_count (Counter)

Number of streamed cache reads started.

This is incremented once for each started stream, not for each chunk in the stream.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_read_duration_usec (Histogram)

The total time spent for each read stream, in microseconds.

This is recorded only for successful reads, and measures the entire read stream (not just individual chunks).

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_read_size_bytes (Histogram)

Total size of each entry retrieved from the cache via streaming, in bytes.

This is recorded only on success, and measures the entire stream (not just individual chunks).

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

set metrics

set metrics track non-streamed cache writes (all data is wrtiten in a single request).

buildbuddy_cache_set_count (Counter)

Number of cache set requests.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_set_duration_usec (Histogram)

The time spent writing each entry to the cache, in microseconds.

This is recorded only for successful sets.

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_set_size_bytes (Histogram)

Size of the value stored in each set operation, in bytes.

This is recorded only for successful sets.

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_set_retries (Histogram)

Number of retries required to fulfill the set request (an observed value of 0 means the transfer succeeded on the first try).

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

write metrics

write metrics track streamed cache writes.

buildbuddy_cache_write_count (Counter)

Number of streamed cache writes started.

This is incremented once for each started stream, not for each chunk in the stream.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_write_duration_usec (Histogram)

The time spent for each streamed write to the cache, in microseconds.

This is recorded only on success, and measures the entire stream (not just individual chunks).

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_write_size_bytes (Histogram)

Size of each entry written to the cache via streaming, in bytes.

This is recorded only on success, and measures the entire stream (not just individual chunks).

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_write_retries (Histogram)

Number of retries required to write each chunk in the stream (an observed value of 0 means the transfer succeeded on the first try).

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

Other cache metrics

buildbuddy_cache_delete_count (Counter)

Number of deletes from the cache.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_delete_duration_usec (Histogram)

Duration of each cache deletion, in microseconds.

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_contains_count (Counter)

Number of contains(key) requests made to the cache.

Labels

  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.
  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_contains_duration_usec (Histogram)

Duration of each each contains(key) request, in microseconds.

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

buildbuddy_cache_contains_retry_count (Histogram)

Number of retries required to fulfill each contains(key) request to the cache (an observed value of 0 means the request succeeded on the first try).

Labels

  • tier: Cache tier: memory or cloud. This label can be used to write Prometheus queries that don't break if the cache backend is swapped out for a different backend.
  • backend: Cache backend: gcs (Google Cloud Storage), aws_s3, or redis.

Misc metrics

buildbuddy_version (Gauge)

Binary version of the running instance.

Always reports a value of 1 similar to the up metric, but has a label containing the version.

Labels

  • version: Binary version. Example: v2.0.0.
  • commit: Binary git commit SHA. Example: 4bd7046417608d785094aa5ec7aa009a9ae53753

buildbuddy_unexpected_event (Counter)

Counter for unexpected events.

Labels

  • name: The name used to identify the type of an unexpected event.

buildbuddy_health_check_status (Gauge)

Health check status.

Labels

  • health_check_name: Name of service the health check is running for (Ex "distributed_cache" or "sql_primary").

buildbuddy_quota_rpcs_handled_total_by_quota_key (Counter)

Total number of RPCs completed on the server by quota_key, regardless of success or failure.

Labels

  • grpc_full_method: The full name of the grpc method: /<service>/<method>
  • quota_key: The key used for quota accounting, either a group ID or an IP address.
  • quota_allowed: Whether the request was allowed by quota manager.

buildbuddy_registry_blob_range_latency_usec (Histogram)

Latency of serving layer blob ranges.

buildbuddy_clickhouse_insert_count (Counter)

Num of rows inserted into ClickHouse

Labels

  • clickhouse_table_name: The name of the table in Clickhouse
  • status: Status of the Clickhouse operation: ok, error.

buildbuddy_clickhouse_query_duration_usec (Histogram)

ClickHouse query duration, in microseconds.

Labels

  • sql_query_template: SQL query before substituting template parameters.

buildbuddy_clickhouse_error_count (Counter)

Number of ClickHouse SQL queries that resulted in an error.

Labels

  • sql_query_template: SQL query before substituting template parameters.

buildbuddy_clickhouse_query_count (Counter)

Number of ClickHouse SQL queries executed.

Labels

  • sql_query_template: SQL query before substituting template parameters.

buildbuddy_compressor_bytes_compressed (Counter)

The number of decompressed bytes passed into compressors

Labels

  • compression: Describes the type of compression

buildbuddy_compressor_bytes_decompressed (Counter)

The number of decompressed bytes passed out of decompressors

Labels

  • compression: Describes the type of compression

buildbuddy_pebble_compression_ratio (Histogram)

The aggregate compression ratio (compressed / decompressed bytes) for a stream of data

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
  • compression: Describes the type of compression

buildbuddy_compressor_compressed_blob_size_write (Counter)

The number of compressed bytes in all blobs

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
  • compression: Describes the type of compression

buildbuddy_compressor_decompressed_blob_size_write (Counter)

The number of decompressed bytes in all blobs

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".
  • compression: Describes the type of compression

Examples

# Histogram buckets with the count of elements in each compression ratio bucket
# Visualize with the Bar Gauge type
# Legend: {{le}}
# Format: Heatmap
sum(increase(buildbuddy_pebble_compression_ratio_bucket[5m])) by(le)

# Percentage of elements that increased in size when compressed (compression ratio > 1)
# Visualize with the Stat type
(sum(buildbuddy_pebble_compression_ratio_count) - sum(buildbuddy_pebble_compression_ratio_bucket{le="1.0"})) / sum(buildbuddy_pebble_compression_ratio_count)

buildbuddy_server_upload_size_bytes (Histogram)

Number of bytes uploaded to the server in each upload.

Use the _sum suffix to get the total uploaded bytes and the _count suffix to get the number of uploaded files.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"

buildbuddy_server_uncompressed_upload_bytes_count (Counter)

Number of uncompressed bytes uploaded to the server.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"
  • group_id: Group (organization) ID associated with the request.

buildbuddy_server_download_size_bytes (Histogram)

Number of bytes downloaded from the server in each download.

Use the _sum suffix to get the total downloaded bytes and the _count suffix to get the number of downloaded files.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"

buildbuddy_server_uncompressed_download_bytes_count (Counter)

Number of uncompressed bytes downloaded from the server.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"
  • group_id: Group (organization) ID associated with the request.

buildbuddy_server_digest_upload_size_bytes (Histogram)

Digest size uploaded to the server in each upload.

This does not always match the actual size uploaded to the server, if the client sends compressed bytes. Use the _sum suffix to get the total uploaded bytes and the _count suffix to get the number of uploaded files.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"

buildbuddy_server_digest_download_size_bytes (Histogram)

Digest size downloaded from the server in each download.

This does not always match the actual size downloaded to the server, if the client requests compressed bytes. Use the _sum suffix to get the total downloaded bytes and the _count suffix to get the number of downloaded files.

Labels

  • cache_type: Cache type: action for action cache, cas for content-addressable storage.
  • server_name: Describes the name of the server that handles a client request, such as "byte_stream_server" or "cas_server"

buildbuddy_logger_log_count (Counter)

The number of logs

Labels

  • status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".

Raft cache metrics

buildbuddy_raft_ranges (Gauge)

Number of raft ranges on each nodehost.

Labels

  • node_host_id: The ID of a raft nodehost.

buildbuddy_raft_leases (Gauge)

Number of raft leases on each nodehost.

Labels

  • range_id: The range ID of a raft region.

buildbuddy_raft_leaders (Gauge)

Number of raft leaders on each nodehost.

Labels

  • range_id: The range ID of a raft region.

buildbuddy_raft_records (Gauge)

Number of raft records in each range.

Labels

  • range_id: The range ID of a raft region.

buildbuddy_raft_bytes (Gauge)

Size (in bytes) of each range.

Labels

  • range_id: The range ID of a raft region.

buildbuddy_raft_proposals (Counter)

The total number of statemachine proposals on each range.

Labels

  • range_id: The range ID of a raft region.

buildbuddy_raft_splits (Counter)

The total number of splits per nodehost.

Labels

  • node_host_id: The ID of a raft nodehost.
  • status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".

buildbuddy_raft_moves (Counter)

The total number of moves per nodehost.

Labels

  • node_host_id: The ID of a raft nodehost.
  • move_type: The type of raft move add, or remove.
  • status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".

buildbuddy_raft_rangecache_lookups (Counter)

The total number of rangecache lookups per nodehost.

Labels

  • rangecache_event_type: Raft RangeCache event type: hit, miss, or update.

buildbuddy_raft_split_duration_usec (Histogram)

The time spent splitting a range in microseconds.

Labels

  • range_id: The range ID of a raft region.

buildbuddy_raft_replica_update_duration_usec (Histogram)

The time spent on replica.Update in microseconds.

Labels

  • range_id: The range ID of a raft region.

buildbuddy_raft_eviction_errors (Counter)

The total number of eviction errors

buildbuddy_raft_listener_events_dropped (Counter)

The total number of dropped listener events

Labels

  • listener_id: The ID of a raft listener
  • listener_event: Raft Listener Event Type

buildbuddy_raft_lease_action_count (Counter)

The total number of lease actions

Labels

  • range_id: The range ID of a raft region.
  • lease_action: The type of lease action Acquire, Drop.
  • status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".

buildbuddy_auth_api_key_lookup_count (Counter)

Total number of API key lookups.

Labels

  • status: Whether or not the API Key lookup hit the in memory cache or not: "cache_hit", "cache_miss" or "invalid_key".

buildbuddy_auth_ip_rules_check_latency_usec (Histogram)

Latency of IP authorization checks.

Labels

  • status: Status code as defined by grpc/codes in human-readable format, such as "OK" or "NotFound".

buildbuddy_encryption_key_refresh_count (Counter)

Total number of encryption key refresh attempts.

buildbuddy_encryption_key_refresh_failure_count (Counter)

Total number of unsuccessful encryption key refresh attempts.

buildbuddy_encryption_encrypted_block_count (Counter)

Total number of blocks encrypted.

buildbuddy_encryption_encrypted_blob_count (Counter)

Total number of blobs encrypted.

buildbuddy_encryption_decrypted_block_count (Counter)

Total number of blocks decrypted.

buildbuddy_encryption_decrypted_blob_count (Counter)

Total number of blobs decrypted.

buildbuddy_encryption_decryption_error_count (Counter)

Total number of decryption errors.

buildbuddy_encryption_key_last_encryption_age_msec (Histogram)

Age of encrypted keys (i.e.

how long it has been since the keys were re-encrypted).

buildbuddy_remote_cache_pebble_cache_atime_update_count (Counter)

Count of processed atime updates.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_atime_delta_when_read (Histogram)

Previous atime of items in the cache when they are read, in msec

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_eviction_samples_chan_size (Gauge)

Num of items in eviction samples chan

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_eviction_resample_latency_usec (Histogram)

Latency of resampling during a single eviction iteration.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_eviction_evict_latency_usec (Histogram)

Latency of evicting a single key.

Labels

  • partition_id: The ID of the disk cache partition this event applied to.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_compact_count (Counter)

Number of compactions performed by the underlying Pebble database.

Labels

  • compaction_type: Pebble DB compaction type.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_compact_estimated_debt_bytes (Gauge)

Estimated number of bytes that need to be compacted for the LMS to reach a stable state.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_compact_in_progress_bytes (Gauge)

Number of bytes present in sstables being written by in-progress compactions.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_compact_in_progress (Gauge)

Number of compactions that are in-progress

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_compact_marked_files (Gauge)

Count of files that are marked for compaction.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_sublevels (Gauge)

Number of sublevels within the level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_num_files (Gauge)

The total number of files in the level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_size_bytes (Gauge)

The total size in bytes of the files in the level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_score (Gauge)

The level's compaction score.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_bytes_in_count (Counter)

The number of incoming bytes from other levels read during compactions.

This excludes bytes moved and bytes ingested. For L0 this is the bytes written to the WAL.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_bytes_ingested_count (Counter)

The number of bytes ingested.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_bytes_moved_count (Counter)

The number of bytes moved into the level by a move compaction.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_bytes_read_count (Counter)

The number of bytes read for compactions at the level.

This includes bytes read from other levels (BytesIn), as well as bytes read for the level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_bytes_compacted_count (Counter)

The number of bytes written during compactions.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_bytes_flushed_count (Counter)

The number of bytes written during flushes.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_tables_compacted_count (Counter)

The number of sstables compacted to this level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_tables_flushed_count (Counter)

The number of sstables flushed to this level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_tables_ingested_count (Counter)

The number of sstables ingested into this level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_level_tables_moved_count (Counter)

The number of sstables ingested into to this level.

Labels

  • level: Pebble DB level number.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_op_count (Counter)

The number of operations performed against the pebble database.

Labels

  • pebble_id: Pebble DB ID
  • pebble_op: Pebble DB operation type.

buildbuddy_remote_cache_pebble_cache_pebble_op_latency_usec (Histogram)

The latency of operations performed against the pebble database, in microseconds.

Labels

  • pebble_id: Pebble DB ID
  • pebble_op: Pebble DB operation type.

buildbuddy_remote_cache_pebble_cache_pebble_block_cache_size_bytes (Gauge)

The total size in pebble's block cache.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_write_stall_count (Gauge)

The number of write stalls

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_pebble_write_stall_duration_usec (Histogram)

The duration of write stall in pebble, in microseconds.

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_zombie_table_count (Gauge)

The number of zombie tables

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_zombie_table_size_bytes (Gauge)

The size of zombie tables in bytes

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_groupid_sample_count (Counter)

The number of times a group has been selected for key sampling.

Labels

  • group_id: Group (organization) ID associated with the request.
  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

buildbuddy_remote_cache_pebble_cache_num_chunks_per_file (Histogram)

Number of chunks per file stored in pebble cache

Labels

  • cache_name: Cache name: Custom name to describe the cache, like "pebble-cache".

Podman metrics

buildbuddy_podman_soci_store_crash_count (Counter)

Total number of times the soci store binary crashed and was restarted.

buildbuddy_podman_get_soci_artifacts_latency_usec (Histogram)

The latency of retrieving SOCI artifacts from the app and storing them locally per image, in microseconds.

Note this is slightly different than the latency of the GetArtifacts RPC as the artifacts must be fetched from the cache and stored locally, which adds some additional time.

Labels

  • container_image_tag: Container image tag.

buildbuddy_podman_get_soci_artifacts_outcome (Counter)

The outcome (cached or reason why not) of SociArtifactStore.GetArtifacts RPCs.

Labels

  • get_soci_artifacts_outcome_tag: SociArtifactStore.GetArtifacts outcome tag.

buildbuddy_podman_image_pull_latency_msec (Histogram)

The latency of 'cold' podman pull requests per image, in milliseconds.

'Cold' means the image hasn't been pulled by this executor previously.

Labels

  • container_image_tag: Container image tag.

Cache Proxy metrics

buildbuddy_proxy_byte_stream_reads (Counter)

The result of serving a byte_stream_proxy.read request out of the byte_stream_server_proxy.

Labels

  • status: Cache lookup result - "hit," "miss," or "partial" (for batched, proxied RPCs where part of the response is served out of the local cache).

buildbuddy_proxy_content_addressable_storage_reads (Counter)

The result of serving a content_addressable_storage read request out of the content_addressable_storage_server_proxy.

Labels

  • op: ContentAddressableStorage Server operation: "FindMissingBlobs", "BatchUpdateBlobs", "BatchReadBlobs", or "GetTree".
  • status: Cache lookup result - "hit," "miss," or "partial" (for batched, proxied RPCs where part of the response is served out of the local cache).

buildbuddy_proxy_content_addressable_storage_digest_reads (Counter)

The per-digest result of serving part of a content_addressable_storage read request out of the content_addressable_storage_server_proxy.

This metric differs from buildbuddy_proxy_content_addressable_storage_reads in that it is recorded once per digest (there can be many digests per request), instead of once per request, thus 'partial' is never possible in this metric.

Labels

  • op: ContentAddressableStorage Server operation: "FindMissingBlobs", "BatchUpdateBlobs", "BatchReadBlobs", or "GetTree".
  • status: Cache lookup result - "hit," "miss," or "partial" (for batched, proxied RPCs where part of the response is served out of the local cache).

Cache Proxy Remote Atime Update Metrics

buildbuddy_proxy_remote_atime_updates (Counter)

The number of remote atime updates enqueued, with the outcome of the enqueue operation.

Labels

  • group_id: Group (organization) ID associated with the request.
  • status: Outcome of attempting to enqueue a remote atime update. One of "enqueued", "duplicate", "dropped_batch_too_large", or "dropped_too_many_batches"

buildbuddy_proxy_remote_atime_update_requests (Counter)

The number of FindMissingBlobRequests sent to the remote cache to update remote blob atimes.

Labels

  • group_id: Group (organization) ID associated with the request.
  • status: Status code as defined by grpc/codes. This is a numeric value; any non-zero code indicates an error.