I've been working on Unix systems in one form or another for, um, let's just say three decades or so. When it was new to me, not a day went by when I didn't learn something I hadn't seen before. Often, I would learn several new things in a day. Now that I'm older more experienced, that doesn't happen as much. I still discover new things, but it's more like every couple of weeks now. Seldom does a month go by that I don't discover something. It's one of the things I like about this business, there's always something to learn. Sometime the new things I encounter are mundane, like a previously unused switch to 'ls' or 'netstat', or maybe an obscure feature of my favorite editor. (See that? I gave you the idea without inviting another battle in the Holy Wars of which editor is better.)
Today I learn that “kill -0” is a thing in Unix.
The Want for a Nonexistent Feature
I was working on a bash script that was launching a lot of processes into the background and I wanted to limit the number of background jobs that were running simultaneously. I could have just written a loop to launch the processes in small groups and then just waited for a group to finish before starting another one, but that's inelegant...what I want is to start the initial group and then start additional processes one at a time as previous jobs finished up.
The wait command in bash doesn't really support this use case because it blocks until multiple processes have finished. It does a "wait for everything", rather than a "wait for anything" type of operation.
One possible option would be to use a pipeline of 'jobs' and 'wc' or 'ps' and 'grep' (on systems that don't have pgrep) to figure out when it was time to launch another process, but that would have resulted in creating a bunch more processes...it would work, but it's goes against the idea of trying to cut down on the number of processes that I'm starting. I'd rather have a non-blocking 'wait' command.
I didn't find a non-blocking wait, but I did find some folks talking about using "kill -0 <pid>". [It's that dash-zero that's the interesting bit.]
The classic use of the kill command is to tell a Unix process that it should clean up after it self and then terminate. "Kill" is really a misnomer though, because what the command actually does is send a SIGNAL to a process to request some action by the process. The default signal is "TERM" which should kill the process, but there are many possible signals and most of them aren't intended to tell a process to die. One option for choosing a signal is to provide the signal number (an integer) preceded by a minus sign. If you want to see a list of signals that are available use "kill -l". (That's an 'ell', not a 'one').
The list of signals you get starts at one and goes up. The number of signals available varies between different variations of operating systems. (The most I've ever seen was 64.) However, it turns out you can also use "-0" as an argument, that's not listed or documented in the man page for the kill command.
Like most commands, when you use ‘kill’ it does some error checking. When you pass one of the documented signal numbers on the command line, the signal gets sent to a process assuming there aren’t any errors. The undocumented feature is that if you use "-0", then the kill command will just do the error checking part but not actually send a signal. This is useful because the error checking will verify that a process exists or not. If it doesn’t exist, that’s an error condition that can be handled by a script.
Note: this behavior isn’t documented for the kill command, but you get a hint about it if you look at the documentation for the kill(2) system call.
So, if a process exists (and you have permissions to signal it) then an attempt to kill it will cause the command to exit with a return code of "0" (no error). Otherwise, the return code will be "1". This give you the ability to easily check if a process is still running without having to wait for it. For example, you can do this:
if ! kill -0 14440; then
echo "The process has exited"
To me, that looks a lot like a non-blocking wait. This also has the advantage that kill is implemented as a built-in for the bash script, so I'm avoiding the situation where I start a whole bunch of processes in a loop just to find out when a process had exited.
Taking advantage of this and writing a loop that makes use of this feature is left as an exercise for the reader.
If you are interested in learning or talking more, please contact us.