Wednesday, June 29, 2016

Developing locally in Linux boxes using Vagrant, vagrant-vbguest, and VirtualBox


If you have read many of my previous entries, I’m sure you are aware that Linux is my primary development environment. For local development projects I use Vagrant and VirtualBox so that I’m able to quickly and easily make use of a variety of Linux distros (CentOS 7, Ubuntu LTS, and sometimes Debian). While I use these tools almost daily in Windows, they are fully supported under OS X and Linux as well.

Vagrant

Vagrant deserves a longer article of its own, so I’ll keep this brief. Simply put Vagrant creates and configures a variety of virtual development providers using any number of virtualization backend including local hypervisors, the cloud, or even containers. With Vagrant installed I can bring up a new Linux development environment with just two commands.

VirtualBox

Many are probably already familiar with Oracle’s excellent VirtualBox virtualization product. For running local Linux development images, I find that VirtualBox 5 meets all my requirements and provides favorable performance to competitors with the added bonus of being free for all purposes (The separate VirtualBox Extension Pack is only free for non-commercial use, however I do not normally utilize its features regardless).

vagrant-vbguest

The missing “glue” to make using Vagrant and VirtualBox together even easier. The vagrant-vbguest plugin for Vagrant automatically updates the guest additions for Linux guests, making it even simpler to create and update new Linux development boxes and/or update VirtualBox itself. The recent release of vagrant-vbguest version 0.12 was one of the impetuses for this post, and I’m pleased to mention that yours truly is credited with the first fix in the changelog: https://github.com/dotless-de/vagrant-vbguest/blob/v0.12.0/CHANGELOG.md

Getting Started

Installation

Install VirtualBox: https://www.virtualbox.org/wiki/Downloads
Install Vagrant: https://www.vagrantup.com/downloads.html
Install vagrant-vbguest:

‘vagrant plugin install vagrant-vbugest’

Provision and Start your Linux development box

In a new dedicated folder:

vagrant init ubuntu/trusty64
vagrant up --provider virtualbox

While the box is usable now, I normally recommend a box reload on first run to load any new kernels or kernel modules.

vagrant reload

Enter your new Linux development box

vagrant ssh

Congratulations!

You are now using your new Linux Development box. When you are complete exit like normal. The image can be suspended and restart/accessed with ‘vagrant halt’ and then ‘vagrant up && vagrant ssh’. When your done with the image ‘vagrant destroy’ will remove all traces. Additional options to configure your box can be found in the 'Vagrantfile' created in the directory you initialized the box. I frequently increase the memory allocation and number of virtual cpus, but customization to the network or even startup scripts are easily available.

The ‘vagrant’ command has a large number of other useful functions, so I encourage you to check them out. I used trusty64 in the example above, but plenty of additional prebuilt box images can be found at https://atlas.hashicorp.com/boxes/search including popular Debian 8, Ubuntu LTS, and CentOS 7 boxes.

Wednesday, June 22, 2016

gdistcc v0.9.x released!


After intensive development I’m pleased to announce gdistcc v0.9.x has just been released!

v0.9.x includes some major new features including:
  • In addition to CentOS 7, gdistcc now supports Ubuntu 16.04 LTS, Ubuntu 14.04 LTS, and Debian 8.  Additional distros should only require proper entries in the new settings.json file and appropriate startup-script.
  • Instance and distribution specific configuration have been moved to their own settings.json file making additions and customizations easy.
  • ‘gdistcc {status, make}’ now checks if an instance has been terminated (Google’s preemptible instances) and exclude as appropriate, ‘gdistcc stop’ removes them as normal.
  • Pump mode is NOT used to greatly reduce instance setup time as system headers are no longer required on the instances.  This also increases compatibility and makes better use of ccache's pre-processed header caching.
  • gdistcc's github page now uses Travis CI (I might make a future post dedicated to this particular topic)
  • gdistcc is now published on PYPI and can be easily installed with just ‘pip install gdistcc’

Tuesday, June 14, 2016

Introducing gdistcc – the easy way to compile in the cloud!

I’ve recently been making a sizeable number of contributions to HHVM, an open source virtual machine for both PHP and Facebook’s Hack language. Given it’s a “robust” code base, a fresh compile can take quite a while on my laptop. Earlier I wrote an entry about using ccache to cache compiled objects for reuse, great when moving between branches or release vs debug code. However, even with make and ccache doing their best, with an active project such as HHVM some updates (especially major header changes) require nearly a full recompile of the code base.

Enter my newest project gdistcc, which automates distributed compiling on Google Compute Engine with economical preemptible instances. While distcc has long existed to provide distributed compiling of C/C++/Objective-C code for those with access to multiple servers, gdistcc makes it easy to provision, compile, and shutdown any number of preconfigured instances for everyone!

Full details on installing and using gdistcc can be found at gdistcc.andrewpeabody.com however here is a quick example for a ccache enabled ‘make’ project once gcloud is setup and gdistcc is installed.

Start 16 gdistcc instances
gdistcc start --qty 16
Build the project (in the root of your ‘make’ project)
gdistcc make
Stop the gdistcc instances
gdistcc stop
That’s it! Currently gdistcc is still pretty rough around the edges and only works on CentOS-7, however I plan to add support for Ubuntu in the near future, and any Google Compute preferred distro should be pretty easy to add.

There are number of limitations, the largest is that I choose (currently) not to make use of distcc’s pump functionality for a number of reasons:
  • Without pump the headers are processed on the local host, this means system headers are NOT needed on the instances which significantly speeds their installation/startup/configuration, and reduces the amount of data that needs to be transferred over the internet (gdistcc uses ssh for security reasons – internal distcc clusters make use of the faster TCP mode).
  • Without pump all that is needed on the instances is the identical version of distcc-server, C/C++/Objective-C compiler which means long term I may develop a more “universal” backend instance.
  • Given that gdistcc currently requires ccache (I think it unlikely I will eliminate that requirement), the headers are frequently pre-processed and cached by ccache anyway, so pump mode is less of any advantage.

I’ll be adding more posts about gdistcc as development progresses.

Wednesday, June 8, 2016

Transparent Executable Compression

Recently I was curious about the underlying design of 3v4l.org which is a great website for comparing behavior between different versions of PHP and HHVM. If you do work in PHP, I highly recommend exploring 3v4l.org as there are almost certainly more behavior differences between the various versions than you expect. As part of this service, 3v4l.org needs to have access to a large number of executables to test, enter Ultimate Packer for eXecutables (UPX).

UPX initially compresses the executable with UCL, a specialized algorithm that enables the decompressor to be just a few hundred bytes of code, no additional memory requirements, very high speed, and in place decompression. The best part is the wide support of executable types including most Linux (ELF), Dos/Win32, and Mac OS X formats.

Getting Started with UPX under CentOS 7


UPX is available from rpmforge, if you already have rpmforge enabled (or wish to) it is as easy as:
sudo yum install upx
Otherwise if you wish to installed it directly:
sudo yum install http://apt.sw.be/redhat/el7/en/x86_64/rpmforge/RPMS/upx-3.91-1.el7.rf.x86_64.rpm http://apt.sw.be/redhat/el7/en/x86_64/rpmforge/RPMS/ucl-1.03-2.el7.rf.x86_64.rpm
Compress an executable:
$ upx hhvmUltimate Packer for eXecutables
Copyright (C) 1996 - 2013
UPX 3.91 Markus Oberhumer, Laszlo Molnar & John Reiser Sep 30th 2013
File size Ratio Format Name
-------------------- ------ ----------- -----------
84749966 -> 21371364 25.22% linux/ElfAMD hhvm
Packed 1 file.
That’s an impressive reduction from 85MB to just 21Mb! For those looking for maximum effect (and time to spare), give the “--best” option a try. The “-l” option can also be used to get details on an already compressed executive. Happy Compressing!

Monday, June 6, 2016

Multithreaded linking with ld.gold, currently minimal benefit

When properly using ‘make’ with a large development tress, I generally find the single largest bottleneck during compilation quickly becomes the linking stage. While most distributions, including CentOS 7, now include ld.gold for a substantially performance boost over the traditional ld, faster performance would always be benefitial. Imagine my excitement when I learned that ld.gold has a multithreaded mode as a possible way in increase speed without new hardware!  However by default ld.gold (part of binutils) is built with it disabled under Centos 7
ld.gold --threads
ld.gold: warning: ignoring --threads: ld.gold was compiled without thread support
GNU gold (version 2.23.52.0.1-55.el7 20130226) 1.11
Originally I though this entry would end up being a recipe on how to rebuild binutils from an srpm with the “--enable-threads” to enable multithreaded. Unfortunately, after testing this myself and some informal benchmarking with 4 threads, the actually speed increase was only about 1-2% - not worth the possible side effects in my opinion. That said the version of binutils/gold included with CentOS 7 is heavily patched and a few versions behind, so it’s possible a newer/vanilla version of ld.gold might benefit greater benefit in threaded mode. In particular devtoolsset-4 from Software Collection comes with a more vanilla ld.gold 2.25, so once I’m ready to move to gcc 5.2 I might conduct this experiment again.  Finally, if you have seen differnet behavior with multithreaded ld.gold I would love to hear about it.

Friday, June 3, 2016

Modern Development Tools in CentOS 7 using Software Collections

I currently use CentOS 7 as my Linux Development environment for two major reasons:
  1. Personal - RedHat was the first Linux distribution I installed/used (RedHat 4.2 if you are curious – and I don’t mean RHEL 4) so while I have frequently dabbled with other distribution such as Gentoo and Ubuntu, it’s always been my “Linux Home”. After RedHat “transitioned” into Fedora Core I started using RHEL 3 professionally before moving to cAos/CentOS.
  2. Life Cycle - Once a server is built (even more so with fleets of servers that need to be binary compatible) it’s very difficult to implement a forklift OS upgrade. Therefore, it is very common to see a physical server run the same distribution version for its entire deployment, and with virtualization often the length of the application deployment! While Ubuntu has more recently tried to address this shortcoming with Long Term Support (LTS) versions, this is one of the major reasons RHEL/CentOS is so popular with its 9-10 year support life cycles.
One major drawbacks to a long life cycle is (to put it mildly) a rather stale developer toolchains. This is often cited as a major reason for the rise in popularity of Ubuntu with developers. Fortunately, RedHat finally addressed this shortcoming with RHEL a few years back with the introduction of Software Collections (SCL) and Developer Toolset. SLC enables newer software versions to be installed and used on RHEL/CentOS (and here is the important part) without disturbing the default systems tools therefore preserving compatibility. The great news is SCL is available for CentOS 7 providing access to the modern Developer Toolset!

Getting Started with Software Collections & Developer ToolSet under CentOS 7


SCL is already included in CentOS extras, so it is a single command to install and enable:
sudo yum install centos-release-scl
Next, there are currently two available versions of Developer ToolSet v3 and v4 for CentOS 7. Version 3 includes GCC 4.9 and version 4 includes GCC 5.2. If you decide to “live on the edge” with GCC 5.2, be aware that GCC 5.1+ uses a new ABI that might cause compatibility issues, one possible options is gcc’s -D_GLIBCXX_USE_CXX11_ABI=0 flag if you need to link to systems libraries that weren’t compiled with GCC 5.1+. In this example I’m using version 3 and I’m currently working with some source code that isn’t yet gcc 5 compatible.
sudo yum install devtoolset-3-toolchain
Alright, you are ready to use the SCL version of gcc and other tools! To enabled and test just type:
scl enable devtoolset-3 bash
gcc --version
gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
ld.gold --version
GNU gold (version 2.24) 1.11
When you are done, it’s easy to just exit the SCL shell and return to the default systems tools.
exit
gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
ld.gold --version GNU gold (version 2.23.52.0.1-55.el7 20130226) 1.11
This is just a small glimpse of the power of Software Collections, so I encourage you to visit their website at https://www.softwarecollections.org to learn even more!


Wednesday, June 1, 2016

Faster Re-Compiling with ccache

Recently I’ve been making a number of contributions to HHVM, an open source virtual machine for both PHP and Facebook’s Hack language. Frequently my work has included squashing bugs such as segfaults or other incompatibilities. This type of work often includes frequent recompiles, after code modification, to move between release and debug branches, etc. While “make” is frequently used by projects to limit recompiles to the part of the program that has changed, it is limited to only the most recent version, and often a “make clean” is needed to remove incompatible object files. Additionally, “make” is only able to reuse object files in a single source tree, so builds in other source trees or on other computers can’t take advantage of its selective recompile.

The good news is there IS a tool that can help address all of these shortcomings: ccache from the author of Samba. Ccache analyses and stores your object file during compilation, so they can be automatically substituted from the cache in future cases where the same compiler, options, and code is present. The best part is ccache guaranties that the cache will provide the same result as the real compiler, including warnings, and will fall back to the real compiler if there is any ambiguity. While ccache is limited to single file C, C++, Objective-C, or Objective-C++ files from a GCC style compilers, it will transparently pass other languages, multi-file compiling, and linking onto the real compiler.

Installing & Using ccache under CentOS 7

Ccache can be found in the terrific EPEL (Extra Packages for Enterprise Linux), if you don’t already have EPEL enabled it is just a single command:
sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-6.noarch.rpm
Install ccache
sudo yum install ccache -y
By default the ccache package installs symbolic links such as /usr/lib64/ccache/{gcc,g++,etc} that point to ccache and can be used directly in place of the various supported compilers. If desired you can put symbolic links in your path before your real compiler and use ccache automatically, however for this example we’ll assume you don’t wish to do that.

For small compile jobs you can now directly use /usr/lib64/ccache/{gcc,g++} to compile your code as normal.
/usr/lib64/ccache/g++ mycode.cpp
If you are using configure or make, you can override your C/C++ compiler with the following option:
{configure,make} “CC=’ccache gcc’ CXX=’ccache g++’”
Finally, for cmake simply include the following:
cmake -D CMAKE_CXX_COMPILER="/usr/lib64/ccache/g++" -D CMAKE_C_COMPILER="/usr/lib64/ccache/gcc" .

Advanced ccache

Ccache also includes a utility to report cache statistics and configure cache options.

View ccache statistics
ccache -s
Set ccache’s cache size to 5G with no item limit
ccache -M 5G –F 0
Ccache can also share a cache between users, or even between machines with NFS! This is great for a group of developers with a large code base. Of course there are some limitations, so be sure to check the ccache documentation for details.

Finally cmake can be configured to use ccache (when present) directly in your MakeLists.txt file.  If you are interested please see my cmake for HHVM commit that was recently accepted by Facebook.

For More Information: