Tuesday, June 14, 2016

Introducing gdistcc – the easy way to compile in the cloud!

I’ve recently been making a sizeable number of contributions to HHVM, an open source virtual machine for both PHP and Facebook’s Hack language. Given it’s a “robust” code base, a fresh compile can take quite a while on my laptop. Earlier I wrote an entry about using ccache to cache compiled objects for reuse, great when moving between branches or release vs debug code. However, even with make and ccache doing their best, with an active project such as HHVM some updates (especially major header changes) require nearly a full recompile of the code base.

Enter my newest project gdistcc, which automates distributed compiling on Google Compute Engine with economical preemptible instances. While distcc has long existed to provide distributed compiling of C/C++/Objective-C code for those with access to multiple servers, gdistcc makes it easy to provision, compile, and shutdown any number of preconfigured instances for everyone!

Full details on installing and using gdistcc can be found at gdistcc.andrewpeabody.com however here is a quick example for a ccache enabled ‘make’ project once gcloud is setup and gdistcc is installed.

Start 16 gdistcc instances
gdistcc start --qty 16
Build the project (in the root of your ‘make’ project)
gdistcc make
Stop the gdistcc instances
gdistcc stop
That’s it! Currently gdistcc is still pretty rough around the edges and only works on CentOS-7, however I plan to add support for Ubuntu in the near future, and any Google Compute preferred distro should be pretty easy to add.

There are number of limitations, the largest is that I choose (currently) not to make use of distcc’s pump functionality for a number of reasons:
  • Without pump the headers are processed on the local host, this means system headers are NOT needed on the instances which significantly speeds their installation/startup/configuration, and reduces the amount of data that needs to be transferred over the internet (gdistcc uses ssh for security reasons – internal distcc clusters make use of the faster TCP mode).
  • Without pump all that is needed on the instances is the identical version of distcc-server, C/C++/Objective-C compiler which means long term I may develop a more “universal” backend instance.
  • Given that gdistcc currently requires ccache (I think it unlikely I will eliminate that requirement), the headers are frequently pre-processed and cached by ccache anyway, so pump mode is less of any advantage.

I’ll be adding more posts about gdistcc as development progresses.

No comments:

Post a Comment