Overview
Bazel breaks a build into discrete steps, which are called actions. Each action has inputs, output names, a command line, and environment variables. Required inputs and expected outputs are declared explicitly for each action. You can set up a server to be a remote cache for build outputs, which are these action outputs. These outputs consist of a list of output file names and the hashes of their contents. With a remote cache, you can reuse build outputs from another user’s build rather than building each new output locally. To use remote caching:- Set up a server as the cache’s backend
- Configure the Bazel build to use the remote cache
- Use Bazel version 0.10.0 or later
- The action cache, which is a map of action hashes to action result metadata.
- A content-addressable store (CAS) of output files.
How a build uses remote caching
Once a server is set up as the remote cache, you use the cache in multiple ways:- Read and write to the remote cache
- Read and/or write to the remote cache except for specific targets
- Only read from the remote cache
- Not use the remote cache at all
- Bazel creates the graph of targets that need to be built, and then creates a list of required actions. Each of these actions has declared inputs and output filenames.
- Bazel checks your local machine for existing build outputs and reuses any that it finds.
- Bazel checks the cache for existing build outputs. If the output is found, Bazel retrieves the output. This is a cache hit.
- For required actions where the outputs were not found, Bazel executes the actions locally and creates the required build outputs.
- New build outputs are uploaded to the remote cache.
Setting up a server as the cache’s backend
You need to set up a server to act as the cache’s backend. A HTTP/1.1 server can treat Bazel’s data as opaque bytes and so many existing servers can be used as a remote caching backend. Bazel’s HTTP Caching Protocol is what supports remote caching. You are responsible for choosing, setting up, and maintaining the backend server that will store the cached outputs. When choosing a server, consider:- Networking speed. For example, if your team is in the same office, you may want to run your own local server.
- Security. The remote cache will have your binaries and so needs to be secure.
- Ease of management. For example, Google Cloud Storage is a fully managed service.
nginx
nginx is an open source web server. With its [WebDAV module], it can be used as a remote cache for Bazel. On Debian and Ubuntu you can install thenginx-extras package. On macOS nginx is available via Homebrew:
/path/to/cache/dir to a valid directory where nginx has permission
to write and read. You may need to change client_max_body_size option to a
larger value if you have larger output files. The server will require other
configuration such as authentication.
Example configuration for server section in nginx.conf:
bazel-remote
bazel-remote is an open source remote build cache that you can use on your infrastructure. It has been successfully used in production at several companies since early 2018. Note that the Bazel project does not provide technical support for bazel-remote. This cache stores contents on disk and also provides garbage collection to enforce an upper storage limit and clean unused artifacts. The cache is available as a [docker image] and its code is available on GitHub. Both the REST and gRPC remote cache APIs are supported. Refer to the GitHub page for instructions on how to use it.Google Cloud Storage
[Google Cloud Storage] is a fully managed object store which provides an HTTP API that is compatible with Bazel’s remote caching protocol. It requires that you have a Google Cloud account with billing enabled. To use Cloud Storage as the cache:- Create a storage bucket. Ensure that you select a bucket location that’s closest to you, as network bandwidth is important for the remote cache.
- Create a service account for Bazel to authenticate to Cloud Storage. See Creating a service account.
- Generate a secret JSON key and then pass it to Bazel for authentication. Store the key securely, as anyone with the key can read and write arbitrary data to/from your GCS bucket.
-
Connect to Cloud Storage by adding the following flags to your Bazel command:
- Pass the following URL to Bazel by using the flag:
--remote_cache=https://storage.googleapis.com{{ '<var>' }}/bucket-name{{ '</var>' }}wherebucket-nameis the name of your storage bucket. - Pass the authentication key using the flag:
--google_credentials={{ '<var>' }}/path/to/your/secret-key{{ '</var>'}}.json, or--google_default_credentialsto use Application Authentication.
- Pass the following URL to Bazel by using the flag:
- You can configure Cloud Storage to automatically delete old files. To do so, see Managing Object Lifecycles.
Other servers
You can set up any HTTP/1.1 server that supports PUT and GET as the cache’s backend. Users have reported success with caching backends such as Hazelcast, Apache httpd, and AWS S3.Authentication
As of version 0.11.0 support for HTTP Basic Authentication was added to Bazel. You can pass a username and password to Bazel via the remote cache URL. The syntax ishttps://username:password@hostname.com:port/path. Note that
HTTP Basic Authentication transmits username and password in plaintext over the
network and it’s thus critical to always use it with HTTPS.
HTTP caching protocol
Bazel supports remote caching via HTTP/1.1. The protocol is conceptually simple: Binary data (BLOB) is uploaded via PUT requests and downloaded via GET requests. Action result metadata is stored under the path/ac/ and output files are stored
under the path /cas/.
For example, consider a remote cache running under http://localhost:8080/cache.
A Bazel request to download action result metadata for an action with the SHA256
hash 01ba4719... will look as follows:
15e2b0d3... to
the CAS will look as follows:
Run Bazel using the remote cache
Once a server is set up as the remote cache, to use the remote cache you need to add flags to your Bazel command. See list of configurations and their flags below. You may also need configure authentication, which is specific to your chosen server. You may want to add these flags in a.bazelrc file so that you don’t
need to specify them every time you run Bazel. Depending on your project and
team dynamics, you can add flags to a .bazelrc file that is:
- On your local machine
- In your project’s workspace, shared with the team
- On the CI system
Read from and write to the remote cache
Take care in who has the ability to write to the remote cache. You may want only your CI system to be able to write to the remote cache. Use the following flag to read from and write to the remote cache:HTTP, the following protocols are also supported: HTTPS, grpc, grpcs.
Use the following flag in addition to the one above to only read from the
remote cache:
Exclude specific targets from using the remote cache
To exclude specific targets from using the remote cache, tag the target withno-remote-cache. For example:
Delete content from the remote cache
Deleting content from the remote cache is part of managing your server. How you delete content from the remote cache depends on the server you have set up as the cache. When deleting outputs, either delete the entire cache, or delete old outputs. The cached outputs are stored as a set of names and hashes. When deleting content, there’s no way to distinguish which output belongs to a specific build. You may want to delete content from the cache to:- Create a clean cache after a cache was poisoned
- Reduce the amount of storage used by deleting old outputs
Unix sockets
The remote HTTP cache supports connecting over unix domain sockets. The behavior is similar to curl’s--unix-socket flag. Use the following to configure unix
domain socket:
Disk cache
Bazel can use a directory on the file system as a remote cache. This is useful for sharing build artifacts when switching branches and/or working on multiple workspaces of the same project, such as multiple checkouts. Enable the disk cache as follows:--disk_cache flag using the ~ alias
(Bazel will substitute the current user’s home directory). This comes in handy
when enabling the disk cache for all developers of a project via the project’s
checked in .bazelrc file.
Garbage collection
Starting with Bazel 7.4, you can use--experimental_disk_cache_gc_max_size and
--experimental_disk_cache_gc_max_age to set a maximum size for the disk cache
or for the age of individual cache entries. Bazel will automatically garbage
collect the disk cache while idling between builds; the idle timer can be set
with --experimental_disk_cache_gc_idle_delay (defaulting to 5 minutes).
As an alternative to automatic garbage collection, we also provide a tool to run a
garbage collection on demand.
Known issues
Input file modification during a build When an input file is modified during a build, Bazel might upload invalid results to the remote cache. You can enable a change detection with the--experimental_guard_against_concurrent_changes flag. There
are no known issues and it will be enabled by default in a future release.
See [issue #3360] for updates. Generally, avoid modifying source files during a
build.
Environment variables leaking into an action
An action definition contains environment variables. This can be a problem for
sharing remote cache hits across machines. For example, environments with
different $PATH variables won’t share cache hits. Only environment variables
explicitly whitelisted via --action_env are included in an action
definition. Bazel’s Debian/Ubuntu package used to install /etc/bazel.bazelrc
with a whitelist of environment variables including $PATH. If you are getting
fewer cache hits than expected, check that your environment doesn’t have an old
/etc/bazel.bazelrc file.
Bazel does not track tools outside a workspace
Bazel currently does not track tools outside a workspace. This can be a
problem if, for example, an action uses a compiler from /usr/bin/. Then,
two users with different compilers installed will wrongly share cache hits
because the outputs are different but they have the same action hash. See
issue #4558 for updates.
Incremental in-memory state is lost when running builds inside docker containers
Bazel uses server/client architecture even when running in single docker container.
On the server side, Bazel maintains an in-memory state which speeds up builds.
When running builds inside docker containers such as in CI, the in-memory state is lost
and Bazel must rebuild it before using the remote cache.
External links
- Your Build in a Datacenter: The Bazel team gave a talk about remote caching and execution at FOSDEM 2018.
- Faster Bazel builds with remote caching: a benchmark: Nicolò Valigi wrote a blog post in which he benchmarks remote caching in Bazel.
- Adapting Rules for Remote Execution
- Troubleshooting Remote Execution
- WebDAV module
- Docker image
- bazel-remote
- Google Cloud Storage
- Google Cloud Console
- Bucket locations
- Hazelcast
- Apache httpd
- AWS S3
- issue #3360
- gRPC
- gRPC protocol
- Buildbarn
- Buildfarm
- BuildGrid
- issue #4558
- Application Authentication
- NativeLink