Overriding GNU Make SHELL variable for massive parallel make

If you use GNU Make in your verification environment, maybe you have dreamed of typing make -j 120. It turns out it is possible, and it is a very interesting use of the GNU Make SHELL variable. You know that make -j N causes GNU make to spawn the build of N prerequisites in parallel. In the code below, prereq1, prereq2 and prereq3 would be built at the same time if -j 3 were used:


target: prereq1 prereq2 prereq3

This may look like nothing, but it gets more interesting when you define myshell as follows, as explained on the GNU mailing list:

ssh `avail_host` "cd $PWD || exit; $2"

Now just type make and your build is distributed to 3 available hosts, as returned by the avail_host script (writing this script is an exercise left to the reader ;-)).

Shortly after coming up with this, a bit of searching revealed that this was nothing new really. The article Distributed processing by make explains that by using the GNU Make SHELL variable, the number of jobs you can dispatch with make is no longer limited to how many cores or local CPUs you have.

The article also shows how to extend the makefile syntax to control the dispatching of commands. Once you realize that GNU Make effectively calls $SHELL by passing “-c” followed by your entire command, you also realize that you can intercept anything you want in your own implementation of $SHELL to control job submission, such as the single, double and triple equal sign syntax in the aforementioned article. All without changing anything to GNU Make source code. Wow!

Exploring a bit further on the GXP website, I have found that the GXP flow for dispatching jobs to multiple machines relies on daemons being spawned at the user space level using the ssh command. I still don’t understand why it would be necessary to spawn daemons though. I would be interested in knowing this.


3 Responses to Overriding GNU Make SHELL variable for massive parallel make

  1. Hi, nice post. I can’t really seem to find any information on other people using GXP. Any idea why? Yours is the only post I can find that even mentions it and it is from 2008. It seems to work quite well for me. Great if you, like me, write bioinformatic pipelines in GNU make. Any thoughts on the reason for using daemons? Handling interrupts perhaps?

  2. Martin d'Anjou says:

    Hi inodb. I have not explored GXP any further. I still use GNU Make every day. I am in the FPGA/ASIC design and verification engineering business. I no longer use the trick desscribed in this post. Instead, I write my GNU Make recipes with Platform LSF “bsub” command directly and without using the SHELL variable to dispatch the long jobs. Platform is now owned by IBM.

  3. taurakenjiro says:

    Hi, it may be too late, but let me comment on this as an author of the above article.

    You can certainly throw jobs without daemons by calling, say, ssh, from within the command you specified in SHELL.

    The reasons for our using daemons are partly historical and partly technical.

    (1) For technical reasons, keeping daemons alive makes job dispatching faster than spawning ssh every time. In addition, if you are in an environment in which throwing a job needs to be done by a batch scheduler, which may be much slower than ssh in large shared environments, it is even more beneficial.

    (2) GXP make is built on top of GXP parallel shell, which is meant to support complex environments across multi-site environments. By complex, I mean that (a) a cluster uses a batch scheduler X, another uses another batch scheduler Y, yet another is fine with SSH, and/or (b) some connections (especially across sites) are blocked. To efficiently support command invocations in this kind of environment, GXP was designed to bring up daemons (by a separate user operation called “explore”) and keep them alive. GXP make is simply built on top of this infrastructure. In this sense using daemons is not an absolute requirements (at least in a simple single cluster used solely with ssh) but a natural outgrowth from a design optimized for getting good responses from a large number of nodes.

    Finally, for the point made by @inodb that no one mentions it, it’s probably that it is (unfortunately for us) simply not well known and we as a small research team didn’t have much resources to maintain and advertise it. We are no longer actively working on it, but I am still happy to help anyone out there who may want to use it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: