Train reinforcement learning baselines on manycore

Barely trained, with --build-arg ARGs='-n 1' [results]

Fully trained [results]

In conjonction with the solar 3D printer / sand melter machine and the DOCKER_HOST=... docker build ... service I am working on (among many other things!) I need to be able to train an RL policy on an arbitrary yet cheap machine.

Reinforcement learning (RL)’s big picture: a thing evolves within an environment and is capable of observation and of taking actions all the while it is maximising some objective. The hardest part for a human here is probably achieving well-defined objective functions…

Meaning I want to train a robot (made of pulleys, joints, strings and what have you) to, for instance, go as far as possible without falling. All this within a simulation that is both Physics-compliant and fast. Just so that later I’d be able to finalize training on the actual /r/Outside robot.

This robot’s training environment is still a work in progress. For it to pick the next action (or, which actuators to power) it relies on the position and shape of the light patch focused on the ground. So I’ll need to do some geometry to compute the ellipse the intersection the light cone makes with the sand plane. Then I’ll run this AI model through to my motors’ drivers and look for speed or power values outside of safety bounds. Machine assembly is the last step, with a webcam and some computer vision pipeline to observe the light patch and to finalize learning. I expect there will be various noises, cable tensions, frictions, precision that’d have escaped simulation. Some more learning in real life will hopefully sort these out, at least I hope so!

I want my laptop to not die and generally still be usable while this training process runs so it’d be great to run this on some 60 core instance at around $1/hour somewhere. But bringing the training code and simulator and also copying out the results should be simple.

So check this out from my CLI tools implemented as docker build calls-backpocket:

export DOCKER_BUILDKIT=1
export DOCKER_HOST=ssh://much.oomphr.dev # A beefy machine
                                         # with many cores
                                         # and much oomph!
# https://oomphr.dev ..love this service! can't wait for it to completely exist

docker build --output=. \
  --build-arg ALGO_NAME=td3 \
  --build-arg ENV_ID=Hopper-v3 \
  https://github.com/fenollp/dockerhost-tools--rl-gym-sb3.git

Yes, you can docker build Git repositories. Although BuildKit only recently supports this for git subfolders so I have created a dedicated repo until moby/buildkit #2116 is in docker-ce.

This ran on a 6c/12t 32GiB amd64 machine for about 5 hours thanks to this DOCKER_HOST environment variable: Docker’s ability to execute jobs on another/remote daemon. It then downloads the results to $PWD.

You get tensorboard graphs along with the RL model:

tensorboard --logdir tensorboard
# ...
# TensorBoard 2.7.0 at http://localhost:6006/ (Press CTRL+C to quit)

Then if you went through the hoops of installing OpenAI’s gym, DeepMind’s MuJoCo, mujoco_py and to clone this repo, then you should be able to enjoy a trained robot running in its simulated world with:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
python enjoy.py --algo td3 --env Hopper-v3 --folder . --exp-id 0

This should reproduce the videos above ;-)