Queue System

Posted by Administrator on Thursday, 16 June 2016

Important points

Never run anything from MMB, just use it as a ladder. ssh to mmb, then ssh to any other node (nd40 to nd78, more or less).
You can get a quick idea about the nodes at Computer Resources page.
To check the specs of a node, you can run "cat /proc/cpuinfo" to learn about the CPU and "free -m" to see the available memory
"w" also outputs the load average, which is the load of the CPUs. If it's between 0 and 1, okay. If it's between 1 and 4, something is being run there, and the node will be laggy and slow. If it's higher than 4, the node is overloaded and it could hang or kill random processes to balance itself.
If you have huge amounts of data, store them in one of the raids, not in Pluto (your home dir). If Pluto gets full, mmb can crash at any minute. Be cautious because the raids crash more often than we'd like

Using the queues

For your own good, use the queues when you need to calculate in parallel. Let's say you have to run the program "runme.sh" for all the fastas in a current folder. Instead of going

$ ./runme.sh 1.fa
$ ./runme.sh 2.fa
$ ./runme.sh 3.fa
...

Use a simple bash script to iterate through the fastas (WARNING: don't use this command if your shell is tcsh, run "echo $SHELL" if unsure)

$ for i in *.fa; do ./runme.sh $i; done

But that is very inefficient, as you'll only use one processor in one machine. You can send the calculation to the queuing system, and it will run in parallel, about 10x faster.

To use the queues:

First of all, import the configuration:

$ source /usr/local/sge-6.1u2/debian-cell-4-intel-xeon/common/settings.sh

Then launch the calculus

$ for i in *.fa; do qsub -cwd <<< "./runme.sh $i"; done

"qsub" is the command to send a job to the queues, and "qstat" shows you the status of the jobs. Reading qsub's manual you can learn more about the switches and how to run it.

Also have in mind that different nodes have different software versions (I'm thinking of php and gnuplot, for example) and you might need to specify which nodes you want to use in the queues to avoid conflicting software versions, shell environment, etc. A trick for that: use a log file to see when the queued command has been run successfully. Append a "hostname" to the job:

$ qsub -cwd <<< "./runme.sh $i"; hostname > $i.log.txt"

That way, when the calculus is finished, a .log.txt file will appear, indicating which node ran the job. To share data between the nodes, you must use pluto or the raids, their local hard disks aren't shared (that's why the queues are fine for running many independent parallel jobs)