The Mighty Thread

Written by Bryon Beilman | Aug 13, 2008 8:54:09 AM

by Bryon D Beilman

An interesting variable and problem I have been working with lately is controlling the number of threads that an application can use at anyone time. Some of the compute intensive simulation software that our clients use have progressed to be able to divide the work into multiple threads to achieve parallel processing of the task. This is particularly awesome especially since I can recall talking to some of these vendors back in 2002 or so when we were trying to do the work on expensive multi-processor machines and their only recommendation was to get fastest clock speed you could because the expense of message passing between processes ate up the savings of parallelization, so they didn't do it. Now, they all have the ability to divide or parallelize mathematical jobs.
Back in 1996 or so, I worked on a project on an Intel MPP (208 processor) parallel computer that was developing weather models for the NWS. It was a mighty machine and project and the programmers all had to program with parallelization in mind. Recently, I was speaking to an intern at one of my clients who is a computer science student. He just finished his freshman year and was wondering when he would have an opportunity to program multi-threaded applications. Is it just the application of semaphors he wondered? Perhaps, but also thinking of how to divide a math problem into areas that can solved in parallel. Perhaps just knowing when to divide the matrices.

Anyway, my intent was really to talk about the variable OMP_NUM_THREADS. This variable is the same on windows as it is on LINUX. The default is to use the number of online processors. One of our clients bought a new 2 CPU Quad processor server that had , in effect, 8 CPU's. When the engineer would run Agilent ADS 2008, for example the momentum jobs would at times, use all 8 CPU's. This was great for him, but bad, unfortunately, for everyone else using the machine.

So, the solution is just to set the OMP_NUM_THREADS=N where N is the maximum number of CPU's you want it to take. This can be done simply in your shell before you start the application. ie in bash
export OMP_NUM_THREADS=4 .

The application will then be limited to use that number of CPU's for the job. If you have multiple people running similar jobs, then they may collectively beat the machine into submission, but it becomes clear very quickly what is happening and it could be adjusted.

I will end this with a related note. Most of the software that can do this type of work is complex and typically expensive. There may be some open source software that does similar work, but in our case the company spends hundreds of thousands to Millions of dollars for the software. When it comes to hardware, they have been so whipped by the cost of the software that I find that they under-buy the hardware. In general, you want as much hardware as you need to keep the licenses you just bought fully engaged. Grids and batching can help with this, but if you have 16 licenses of simulation then you need at least 16 cores plus what you need for the design software, so that the $750K worth of simulation licenses can be utilized fully. The incremental cost of a server is small to allow full utilization of the software.

View full post