Infinitely faster initial Rust builds with DOCKER_HOST (and BuildKit)

Following my first blog post ever on a similar subject I found cargo-wharf, a cacheable and efficient Docker images builder for Rust.

This is an alternate Dockerfile frontend syntax implementation for Rust: it converts a Cargo.toml (a file listing cargo / Rust dependencies and things) into a docker buildable recipe by adding this # syntax line (+ caveats)

# syntax = denzp/cargo-wharf-frontend:v0.1.0-alpha.2

[package]
...

Then with the following one is able to create a Docker image from this Cargo.toml

$ DOCKER_BUILDKIT=1 docker build -t service:latest -f Cargo.toml .

Not demonstrated in that repo but supposedly supported is building binaries:

$ DOCKER_BUILDKIT=1 docker build --platform=local --output=. --file=Cargo.toml .
# or even:
$ DOCKER_BUILDKIT=1 docker build --platform=local --output=. https://github.com/some/repo.git#master:sub/context

Note that using Docker context sub/context isn’t yet supported by BuildKit

As noted in that repo’s README:

Every dependency is built in its isolated environment and cached independently from others.

Rust projects are notoriously slow to build, especially the initial build.

Being a fan of DOCKER_HOST this immediately tickled my ears!

A mutualized build artifacts cache

Say you want to work on some large Rust project for the first time. You clone it, then DOCKER_BUILDKIT=1 docker build --platform=local ... it having set DOCKER_HOST=ssh://some_machine.com.

Now your build runs on a beefy machine somewhere and sends you the outputs.

Not only

The Rust community and its backers

Unsure your remote-built project wasn’t backdoored by a malicious some_machine.com / cache / middle person?

Unset DOCKER_HOST, re-run the command and compare sha256(remote-built) with sha256(locally-built)

In fact, all language communities should be sharing a build cache, provided

I’m obviously not a genius. Here are some of other people’s take on this:

Towards a distributed crate cache

cf rust-lang/cargo#1997 which mentions sccache.

Building Rust code with cargo and verbosity toggled on, one sees rustc calls such as:

❯ cargo --verbose install cargo-edit
# ...
rustc \
    --crate-name autocfg $HOME/.cargo/registry/src/github.com-1ecc6299db9ec823/autocfg-1.1.0/src/lib.rs \
    --error-format=json \
    --json=diagnostic-rendered-ansi,future-incompat \
    --crate-type lib \
    --emit=dep-info,metadata,link \
    -C embed-bitcode=no \
    -C debug-assertions=off \
    -C metadata=6e4def821aa49e9d \
    -C extra-filename=-6e4def821aa49e9d \
    --out-dir $PWD \
    -L dependency=$PWD \
    --cap-lints allow
# which produces the files
# -rw-rw-r-- autocfg-6e4def821aa49e9d.d
# -rw-rw-r-- libautocfg-6e4def821aa49e9d.rlib
# -rw-rw-r-- libautocfg-6e4def821aa49e9d.rmeta

which builds a crate. These arguments,

should provide enough information to build a crate’s cache key (humongous caveat: in hopefully most cases).

From there it’s hermetic builds, content-addressable storage and distributed compilation turtles all the way down. Easy!

Figuring out a crate’s cache key depends on that project’s specific build plan which itself might depend on some environment variables, the version of the current shell… in short once a crate relies on a build.rs file for compilation (meaning some non-cargo/rustc code from the Internet gets executed on your machine!) all cache key bets are off.

Caching crates should be possible for some (most?) crates, but not these ones anyway.

RUSTC_WRAPPER setting should help get there. Although looking at sccache’s caveats

Private code & security

These are the privacy and security concerns I can see from my echo-chamber-slash-comfy-chair:

To address these points I see:

I’m describing a DHT, a decentralized one. Maybe even IPFS.