Running Pre-commit in Docker

Make your "pre-commit" command run in a Docker container and reduce attack surface on your development system!

You probably know pre-commit which is a super-popular way to automatically install and run the required linters of a Git repository. It'll fetch all the required software and run it from an isolated environment, such as a virtual environment or a node environment.

Only problem: It's not really that isolated. So if any of these pre-commit hooks would suffer a vulnerability, they could have access to the secret files in the repository, your SSH keys or your various tokens for PyPi, GitHub, AWS, NPM etc.

As you might have noticed, supply-chain attacks are becoming more and more common. So the chance that a pre-commit hook falls victim seems evermore increasing. Especially given that exploiting a pre-commit hook is very attractive, given that it would gain the attacker access to developer's setups... the types of setups that usually also have access to lots of other exciting things.

I'm not trying to tip off attackers, I'm sure they already know this.

What can we do?

Obviously, if all the hooks could be packaged from trusted sources that would follow certain standards (such as Debian packages), we could start running linters in a more standardized way etc. But that seems a bit like wishful thinking right now. Pre-commit hasn't been trying to make it easy to install standardized packages, but more the opposite: To speed up the distribution of small, convenient automatic script-like features.

That does seem like something you'd want to run in a less trusted environment.

So the proposition here is to see how far we can get with Docker.

Try it out

I've written a little Docker image with some instructions here:

https://codeberg.org/benjaoming/pre-commit-docker-alias

What you get from that is basically a global alias for pre-commit that will work for all your projects. It contains a PyPi and NPM environment, so it's ready to handle these types of hooks.

More can be added, and we can figure out the basic day-to-day needs.

What are the catches?

I found only a small performance catch of like a 0.4s startup overhead on my laptop. This comes from spinning up the Docker container. It's nothing that you'll notice.

We should also be aware that the Git repository is mounted into the Docker container, which means you still have to secure any secrets that are exposed through the repository. I imagine we should stop storing secrets in .env files.

The other catches relate to current limitations to what you can do in the Docker image: It doesn't support Docker itself and some other language environments. But the images can be customized for this.

I encourage that you just fork my project and make it your own. It's not like the global image needs to do everything, we can manage different images for different projects and come up with a way to manage that,.