Skip to main content

Troubleshooting Bazel with Git Bisect

[object Object]
Son Luong Ngoc, Solution Engineer @ BuildBuddy

Upgrading Bazel and the related dependencies can sometimes lead to unexpected issues. These issues can range from build failures to runtime errors, and generally, they can be hard to troubleshoot.

So today, we will discuss how to narrow down the root cause of build failures after a dependency upgrade using git bisect.

Build failed after upgrading Bazel

If you are like me, you would enjoy having the latest and greatest tools in your project. And the tool I use the most is Bazel, so as the recent 8.1.0 release came out, I decided to upgrade our BuildBuddy repository to use it.

$ echo '8.1.0' > .bazelversion

$ bazel test --config=remote-minimal //...
...
ERROR: /root/workspace/output-base/external/bazel_tools/tools/build_defs/repo/http.bzl:137:45: An error occurred during the fetch of repository 'rules_cc+':
Traceback (most recent call last):
File "/root/workspace/output-base/external/bazel_tools/tools/build_defs/repo/http.bzl", line 137, column 45, in _http_archive_impl
download_info = ctx.download_and_extract(
Error in download_and_extract: com.google.devtools.build.lib.remote.common.CacheNotFoundException: Missing digest: abc605dd850f813bb37004b77db20106a19311a96b2da1c92b789da529d28fe1/178823
...

Whoops! It seems like the build failed after upgrading Bazel to 8.1.0. Flipping back to 8.0.1 made the problem go away, which means the issue was introduced in Bazel 8.1.0.

What should we do next?

Auto-bisect with bazelisk

Luckily, the Bazel team has provided us with a magic flag inside bazelisk that can help us with this.

--bisect flag automatically lists the commits between the two Bazel releases and helps us bisect them. It works like this:

alias bazel=bazelisk

# Usage:
# bazelisk --bisect=<good commit hash>..<bad commit hash> test //foo:bar_test
$ BAZELISK_CLEAN=1 bazelisk --bisect=8.0.1..8.1.0 test --config=remote-minimal //...

Explanation: The last-known good release for us was 8.0.1, and the bad release was 8.1.0. Bazelisk will grab the list of commits between 2 versions from the Github API and start a bisect. For each bisect commit, it will grab a pre-built Bazel binary of that version from a Google Cloud Storage bucket and run the test command accordingly. The .bazelversion file will be ignored in these runs.

Because the issue was related to external dependencies download, we set BAZELISK_CLEAN=1 for the whole Bazelisk bisect process. This will add bazel clean --expunge in between each test run, effectively cleaning Bazel's output base and shutting down the Bazel JVM process, to make sure we reproduce the issue in a mint environment. You can check out other environment variables that Bazelisk supports here

Since we know that the issue was caused by the external download of @rules_cc, we can narrow down the test command to a bazel query that would still trigger the same download. This should help speed up each bisect run and reduce the chance of flakiness.

$ BAZELISK_CLEAN=1 bazelisk --bisect=8.0.1..8.1.0 query --config=remote-minimal @rules_cc//:all

And here is the result:

--- Getting the list of commits between 8.0.1 and 8.1.0
Found 95 commits between (24cba660a7231786405ac40335c2e7e5bd4d6859, 8.1.0]
--- Verifying if the given good Bazel commit (24cba660a7231786405ac40335c2e7e5bd4d6859) is actually good
--- Start bisecting
--- Testing with Bazel built at c5cf63199b3964b4a188da73a6c24599468131e9, 95 commits remaining...
--- Succeeded at c5cf63199b3964b4a188da73a6c24599468131e9
--- Testing with Bazel built at 040769767613287a0f1b5ceed41b9d7729126983, 47 commits remaining...

2025/02/18 10:56:29 Using unreleased version at commit 040769767613287a0f1b5ceed41b9d7729126983
2025/02/18 10:56:29 Downloading https://storage.googleapis.com/bazel-builds/artifacts/macos_arm64/040769767613287a0f1b5ceed41b9d7729126983/bazel...
2025/02/18 10:56:29 Skipping basic authentication for storage.googleapis.com because no credentials found in /Users/sluongng/.netrc
2025/02/18 10:56:29 could not run Bazel: could not download Bazel: failed to download bazel: failed to download bazel: HTTP GET https://storage.googleapis.com/bazel-builds/artifacts/macos_arm64/040769767613287a0f1b5ceed41b9d7729126983/bazel failed with error 404

Here, Bazelisk was able to identify 8.0.1 to be 24cba660a and verified that it was indeed a good commit. It then started bisecting the commits between 24cba660a and 8.1.0. The first bisect commit c5cf63199b was good. However, the second bisect commit 0407697676 failed to download the Bazel binary from the Google Cloud Storage (GCS) bucket. This is a different failure than the original issue we are investigating, so how do we proceed?

Manual bisect with git-bisect

If Bazelisk had a flag to help us mark a few known commits as "SKIP", it would have been perfect here, but unfortunately, it doesn't. But I know for a fact that git bisect does have this feature, so let's switch to manual bisecting.

# My personal working directory
$ cd ~/work/bazelbuild/

# Clone the Bazel repository
$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel

# Start bisecting
$ git bisect start --no-checkout --first-parent 8.1.0 8.0.1
Bisecting: 47 revisions left to test after this (roughly 6 steps)
[8afe16e0396be93cc5d9bc2108aab2e45dfcb2bd] [8.1.0] Fix docs link rewriting for rules_android (#25018)

Note that thanks to the pre-built binaries from the GCS bucket managed by the Bazel team, we don't need to build Bazel ourselves. Because of this, we can use the --no-checkout flag to speed up the bisect process. When --no-checkout is used, the current bisect commit could be found in .git/BISECT_HEAD file. --first-parent forces the bisect to only follow the first parent of the commit, which is usually all the merge commits in the master/main branch of each repo. Thankfully, the bazel.git repo history is very linear so --first-parent is not really needed here, but I will keep the flag in to help the readers (including my future self) copy pasting easier.

$ cat .git/BISECT_HEAD
8afe16e0396be93cc5d9bc2108aab2e45dfcb2bd

So how do we tell Bazelisk to use a pre-built binary from the GCS bucket that belongs to this commit? Luckily, we can set the environment variable USE_BAZEL_VERSION for this exact reason.

$ USE_BAZEL_VERSION=7.1.0 bazel version
version
Bazelisk version: 1.25.0
Extracting Bazel installation...
Build label: 7.1.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Mar 11 17:55:51 2024 (1710179751)
Build timestamp: 1710179751
Build timestamp as int: 1710179751

With this, we can manually alternate between the 2 directories ~/work/bazelbuild/bazel and ~/work/buildbuddy-io/buildbuddy to run the test command and mark the commit as good or bad.

# Copy the new bisect commit to the clipboard
$ export USE_BAZEL_VERSION=$(cat .git/BISECT_HEAD)

# Verify the commit
$ cd ~/work/buildbuddy-io/buildbuddy
$ bazel ... query ...
...

$ cd ~/work/bazelbuild/bazel
# $ git bisect good
# $ git bisect bad
# $ git bisect skip

# Repeat

However, that would not make for a really good blog post. So let's be a bit more fancy and create a script to help us automate this bisect process using git bisect run.

$ cat test-buildbuddy.sh
#!/bin/bash

export USE_BAZEL_VERSION='8.0.1'
# export USE_BAZEL_VERSION=$(cat .git/BISECT_HEAD)

function cleanup()
{
(
cd ~/work/buildbuddy-io/buildbuddy
bazel clean --expunge
)
}
trap cleanup EXIT

(
cd ~/work/buildbuddy-io/buildbuddy
STDERROUT=$(bazel 2>&1 query --repository_cache='' --config=remote-minimal @bazel_features//:all)
BAZEL_EXIT_CODE=$?
# If stderr contains 'could not download Bazel' then return with code 125
# to skip the current commit during bisect.
if [[ $BAZEL_EXIT_CODE -ne 0 && $STDERROUT == *"could not download Bazel"* ]]; then
echo "Bazel download failed. Skipping commit $USE_BAZEL_VERSION."
exit 125
fi
if [[ $BAZEL_EXIT_CODE -ne 0 ]]; then
echo "Bazel query failed with exit code $BAZEL_EXIT_CODE."
fi
exit $BAZEL_EXIT_CODE
)

Explanation:

First, we want to export the USE_BAZEL_VERSION environment variable to the commit hash that we are currently bisecting. This will tell Bazelisk to use the pre-built binary from the GCS bucket that belongs to this commit. However, we won't use BISECT_HEAD just yet since we want to verify that this script works first. Knowing that 8.0.1 was the last known good commit, we can set the USE_BAZEL_VERSION to do a sanity check.

Next, since we are not using Bazelisk bisect feature, the automatic cleanup feature of BAZELISK_CLEAN won't work. Instead, we will manually clean the workspace after each test run with bazel clean --expunge to achieve similar result.

After that, we want to make sure that we handle the download issue as some commits might be missing from the GCS bucket. git bisect run allows us to skip the current commit by returning with an exit code of 125.

The special exit code 125 should be used when the current source code cannot be tested. If the script exits with this code, the current revision will be skipped

So we check if the Bazel query failed with the error message could not download Bazel and return with the exit code 125 to skip the current commit. Otherwise, we return with the exit code of the Bazel query command.

Now let's verify that our script works on 8.0.1:

$ ./test-buildbuddy.sh
Starting local Bazel server (no_version) and connecting to it...
...
@rules_cc//:empty_lib
@rules_cc//:link_extra_lib
@rules_cc//:link_extra_libs
Loading: 1 packages loaded

Now let's edit the script to use the USE_BAZEL_VERSION from the BISECT_HEAD file for the bisect run and run the bisect:

$ cat test-buildbuddy.sh
#!/bin/bash

# export USE_BAZEL_VERSION='8.0.1'
export USE_BAZEL_VERSION=$(cat .git/BISECT_HEAD)
...

$ git bisect run ./test-buildbuddy.sh
running './run.sh'
Bisecting: 23 revisions left to test after this (roughly 5 steps)
[11e85a4dc73f93ef25809e7d5f0409aaca7d42f1] [8.1.0] Respect comprehension variable shadowing in Starlark debugger output (#25139)
running './run.sh'
Bazel download failed. Skipping commit 11e85a4dc73f93ef25809e7d5f0409aaca7d42f1.
2025/02/18 13:18:12 Using unreleased version at commit 11e85a4dc73f93ef25809e7d5f0409aaca7d42f1
2025/02/18 13:18:12 Downloading https://storage.googleapis.com/bazel-builds/artifacts/macos_arm64/11e85a4dc73f93ef25809e7d5f0409aaca7d42f1/bazel...
2025/02/18 13:18:12 Skipping basic authentication for storage.googleapis.com because no credentials found in /Users/sluongng/.netrc
2025/02/18 13:18:12 could not download Bazel: failed to download bazel: failed to download bazel: HTTP GET https://storage.googleapis.com/bazel-builds/artifacts/macos_arm64/11e85a4dc73f93ef25809e7d5f0409aaca7d42f1/bazel failed with error 404
Bisecting: 23 revisions left to test after this (roughly 5 steps)
[12a3fc001b6629b52f5b24dce6018884222a0608] [8.1.0] See and use more than 64 CPUs on Windows (#25140)
running './run.sh'
Bisecting: 10 revisions left to test after this (roughly 4 steps)
[af6307bc6832d66cce772c5170961b4ff4521e48] [8.1.0] Configure `--run_under` target for the test exec platform (#25184)
running './run.sh'
Bisecting: 4 revisions left to test after this (roughly 3 steps)
[14219c4698e112a03ebe62eed7cb324f625f13c8] [8.1.0] Use digest function matching the checksum in gRPC remote downloader (#25225)
running './run.sh'
Bazel query failed with exit code 1.
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[5f3a083d5649715dc0bed811ef41f53b91539d1d] [8.1.0] Update to use coverage_output_generator-v2.8 (#25202)
running './run.sh'
Bisecting: 1 revision left to test after this (roughly 1 step)
[a40a0cd9947dd73ec07f2394d108eb7e98745161] [8.1.0] Add version selector buttons to repo rule docs (#25211)
running './run.sh'
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[aa4531d5a2116f85b80a753c53528032ed3cda71] [8.1.0] Don't suggest updates to private repo rule attributes (#25213)
running './run.sh'
14219c4698e112a03ebe62eed7cb324f625f13c8 is the first bad commit
commit 14219c4698e112a03ebe62eed7cb324f625f13c8
Author: bazel.build machine account <ci.bazel@gmail.com>
Date: Fri Feb 7 12:07:48 2025 +0100

[8.1.0] Use digest function matching the checksum in gRPC remote downloader (#25225)

Fixes https://bazelbuild.slack.com/archives/CA31HN1T3/p1738763759125489

Closes #25206.

PiperOrigin-RevId: 724267755
Change-Id: Ia23bdae310231bd0ee5763311b948f3465aa8ed0

Commit
https://github.com/bazelbuild/bazel/commit/ef45e02bfb4af1124bb9ad1ef94f36c70c82ce48

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>

.../remote/downloader/GrpcRemoteDownloader.java | 23 ++++++++++++++--------
.../downloader/GrpcRemoteDownloaderTest.java | 23 +++++++++++++---------
2 files changed, 29 insertions(+), 17 deletions(-)
bisect found first bad commit

And voilà! We have found the commit that introduced the issue. It was 14219c4698e112a03ebe62eed7cb324f625f13c8, which was introduced in Bazel 8.1.0.

We won't be diving into the details of this specific issue in this blog post, but if you are interested, you can read more about it via the revert PR #25320. We expect this to be fixed in the upcoming Bazel 8.1.1 release.

Build failed after upgrading a dependency

This bisect technique can also be used to troubleshoot issues that arise after upgrading a Bazel dependency. For example, recently we attempted upgrading @rules_go in our repository from v0.51.0 to v0.53.0, and the build failed.

$ bazel build server
...
ERROR: /private/var/tmp/_bazel_sluongng/06e573a93bc2d6a9cad4ad41f00b4310/external/bazel_gazelle/internal/go_repository_cache.bzl:30:17: An error occurred during the fetch of repository 'bazel_gazelle_go_repository_cache':
Traceback (most recent call last):
File "/private/var/tmp/_bazel_sluongng/06e573a93bc2d6a9cad4ad41f00b4310/external/bazel_gazelle/internal/go_repository_cache.bzl", line 30, column 17, in _go_repository_cache_impl
fail('gazelle found more than one suitable Go SDK ({}). Specify which one to use with gazelle_dependencies(go_sdk = "go_sdk").'.format(", ".join(matches)))
Error in fail: gazelle found more than one suitable Go SDK (go_host_compatible_sdk_label, go_sdk_darwin_arm64). Specify which one to use with gazelle_dependencies(go_sdk = "go_sdk").
ERROR: no such package '@@org_golang_google_grpc//reflection': gazelle found more than one suitable Go SDK (go_host_compatible_sdk_label, go_sdk_darwin_arm64). Specify which one to use with gazelle_dependencies(go_sdk = "go_sdk").
ERROR: /Users/sluongng/work/buildbuddy-io/buildbuddy/cli/cmd/sidecar/BUILD:3:11: //cli/cmd/sidecar:sidecar depends on @@org_golang_google_grpc//reflection:reflection in repository @@org_golang_google_grpc which failed to fetch. no such package '@@org_golang_google_grpc//reflection': gazelle found more than one suitable Go SDK (go_host_compatible_sdk_label, go_sdk_darwin_arm64). Specify which one to use with gazelle_dependencies(go_sdk = "go_sdk").
ERROR: Analysis of target '//cli:cli' failed; build aborted: Analysis failed
...

This time, we can use the same bisect technique to identify the commit that introduced the issue. To make the git bisect a bit less tedious, let's use a local copy of @rules_go instead of the one managed by Bazel.

$ cd ~/work/bazelbuild
$ git clone https://github.com/bazel-contrib/rules_go.git
$ cd rules_go
$ git checkout v0.51.0

With this, we can add the following lines to our .bazelrc to tell Bazel to use our local copy instead of the one managed by Bazel.

$ cd ~/work/buildbuddy-io/buildbuddy
$ tail -n 5 .bazelrc
## BZLMOD
common --override_module=rules_go=/Users/sluongng/work/bazelbuild/rules_go

## WORKSPACE
common --override_repository=io_bazel_rules_go=/Users/sluongng/work/bazelbuild/rules_go

Pro tips: These flags are really handy, so I would recommend keeping them in your .bazelrc file as comments for future use.

Now we can start the bisect process in the @rules_go repository.

$ cd ~/work/bazelbuild/rules_go
$ cat test-rules-go.sh
#!/bin/bash

(
cd ~/work/buildbuddy-io/buildbuddy
bazel build server
)
$ chmod +x test-rules-go.sh

Since we are relying on our local copy of @rules_go, we do not need to handle the download issue like the previous script. We are also not expecting the build to fail because of external factors, so no need for setting BAZELISK_CLEAN or handling our own cleanup. This also means that we are reusing the same Bazel JVM process for each bisect run, taking advantage of the hot analysis cache to keep our builds blazingly fast.

Now let's start the bisect process:

$ cd ~/work/bazelbuild/rules_go
$ git bisect start --first-parent v0.53.0 v0.51.0

Note here that we do NOT want to use the --no-checkout flag as the working copy of the repo is used for the bisect run and therefore, needs to be updated.

$ git bisect run ./test-rules-go.sh
running './test-rules-go.sh'
Bisecting: 8 revisions left to test after this (roughly 3 steps)
[4f5202adf56521b3048536d04eef12690557fa7c] Mention `dev_dependency` in `go_sdk.host` error (#4246)
running './test-rules-go.sh'
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[66477c1b41b2449c8102f4338d011f07e4df04b6] Update documentation reference (#4237)
running './test-rules-go.sh'
Bisecting: 1 revision left to test after this (roughly 1 step)
[5eb06119c49b97f16aa79d53cdcd99f95b1000bf] Allow .so files to have more extensions (#4232)
running './test-rules-go.sh'
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[d25e4e75f0ce8e419593a5c633f852ff1c08e292] Use same Go SDK as Gazelle for `go_bazel_test` (#4231)
running './test-rules-go.sh'
d25e4e75f0ce8e419593a5c633f852ff1c08e292 is the first bad commit
commit d25e4e75f0ce8e419593a5c633f852ff1c08e292
Author: Fabian Meumertzheim <fabian@meumertzhe.im>
Date: Thu Jan 16 08:13:09 2025 +0100

Use same Go SDK as Gazelle for `go_bazel_test` (#4231)

**What type of PR is this?**

Bug fix

**What does this PR do? Why is it needed?**

**Which issues(s) does this PR fix?**

Fixes #4228

**Other notes for review**

MODULE.bazel | 2 +-
go/private/repositories.bzl | 11 +++++++++++
go/tools/bazel_testing/BUILD.bazel | 9 ++++++---
3 files changed, 18 insertions(+), 4 deletions(-)
bisect found first bad commit

And there you have it! The issue was introduced in commit d25e4e75f0ce8e419593a5c633f852ff1c08e292 in the @rules_go repository, which was introduced in v0.53.0. This was also fixed swiftly by Fabian in rules_go's PR #4264.

Conclusion

Using git bisect to troubleshoot build failures after upgrading Bazel or its dependencies can be a powerful tool. This helps us narrow down the root cause of the issue to the exact commit that introduced it.

This makes the error much more actionable. For example, when we were able to identify the issue to be in rules_go@v0.53.0, we were able to upgrade to v0.52.0 instead and reported the fix to the upstream open-source project. In other cases, we can proceed with the upgrade but with a specific revert of the commit that introduced the issue patched into the external dependencies.

I hope this guide was helpful to you and that you can use it to troubleshoot your external dependency upgrades in the future.