GNU Make poetry

I use GNU Make a lot. A while back I wrote this:

Three rules for a thousand Vera files
Seven for the Designers in the Valleys of Verilog
Nine for a dynamic Register Abstraction Layer
One for processing the output log
In the Compute Cloud where the CPUs lie.

One target to rule them all, One vpath to find them
One target to bring them all and in the makefile build them
In the land of GNU Make where the DAG shines.

Continous integration system using parallel make

In software design, a continuous integration system is a system for compiling and testing code on a continuous basis. When a build fails, the code is rejected and the owner notified. When the code is good (compiles okay and sanity tests pass), the code is accepted and is made available for distributing to the development team.

There are many aspects to a continuous integration system. A hypothetical buildbot could implement the following phases:

  1. Receive code submitted to the system from a request queue
  2. Build the code. There are two possible outcomes:
    1. The build fails, reject the code and process the next request in the queue
    2. The build succeeds, move on to the test phase
  3. Test the code. There are two possible outcomes:
    1. One or more test fails, reject the code and process the next request in the queue
    2. All tests pass, accept the code and make it available for distribution among the team

This needs to be as independent as possible from the source code management tool, but hooks are necessary for the buildbot to extract code from it, and mark the source code management database with some form of tag when code is accepted after it passes the tests. I mentionned sending pass/fail notifications to owners, but I won’t cover it here (most likely, sendmail is your friend).

A the heart of the buildbot is the build system, its purpose is to build (compile) the code. A build system based on GNU Make can take advantage of GNU Make parallel prerequisite build capabilities to speed up the build. By default GNU Make will attempt to build as much of your code as possible. That is, parallel make will make all the targets it can except those whose dependencies have failed to build. While this might be desirable when you incrementally compile code for yourself, this behavior is detrimental in the case where a buildbot tries to process independent build requests from a submission queue. The buildbot simply needs to abort at the first failure, notify the owner, clean up, and move on to the next request.

So we want GNU Make to abort at the first failure when we do make -j N. There is nothing in GNU Make allowing that. However, we can edit job.c and add a call to fatal_error_signal(SIGTERM), so that GNU Make will abort, as if we had hit control-c on the command line, at the first error it encounters:

# GNU Make 3.81, job.c, near line 500
if (err && block)
  {
    static int printed = 0;
    fflush (stdout);
    if (!printed)
    {
      error (NILF, _("*** Waiting for unfinished jobs...."));
      fatal_error_signal(SIGTERM);
    }
    printed = 1;
  }

Save, compile, and try this modification with a simple makefile:

all: t1 t2
t1:
        sleep 60 && echo done || echo failed
t2:
        sleep 2 && exit 1

The long compile times is emulated with sleep 60 and the error with a simple exit 1 statement. Now running this with the modified make:

$ make -j 2
sleep 60 && echo done || echo failed
sleep 2 && exit 1
make: *** [t2] Error 1
make: *** Waiting for unfinished jobs....
make: *** [t1] Terminated
Terminated
$ ps -ef | grep sleep
martin   22343     1  0 18:36 pts/5    00:00:00 sleep 60

That did not go as expected. GNU Make did exit, but the sleep 60 remains on the system! If we were using command substitution, we would want the command to return at the first error, but it does not return until the sleep 60 is done:

$ time t=$(make -j 2)
make: *** [t2] Error 1
make: *** Waiting for unfinished jobs....
make: *** [t1] Terminated

real    1m0.007s
...
$

Clearly, something is wrong as make returned to the prompt before sleep 60, but the command substitution did not return the same way, it waited until the end.

Changing the makefile to the following alters the behavior:

all: t1 t2
t1:
        sleep 60

t2:
        sleep 2 && exit 1

Notice how I simplify the command in the t1 target: sleep 60 is the only command in the chain. The outcome is better:

$ make -j 2
sleep 60
sleep 2 && exit 1
make: *** [t2] Error 1
make: *** Waiting for unfinished jobs....
make: *** [t1] Terminated
Terminated
$ ps -ef | grep sleep
$

The first observation is that the sleep 60 is killed, and, not shown but tried, command substitution returns as soon as the first target fails. Now this is annoying. How can I know which commands will be terminated properly and which ones won’t?

Paul D. Smith explains that GNU Make has two paths for running command: simple and complex. A complex command contains things like && and ||, in which case GNU Make opens a shell to run the command and it is that shell that receives SIGTERM. Armed with this information I head to #bash where I learn that I simply need to add a trap to the complex commands, so I rewrite the makefile like so:

all: t1 t2
t1:
        trap 'kill $$(jobs -p)' EXIT; sleep 60 && echo done || echo failed
t2:
        trap 'kill $$(jobs -p)' EXIT; sleep 2 && exit 1

And now make aborts at the first error, command substitution does not hang and no process is left lingering.