[torqueusers] external submit hosts
glen.beane at gmail.com
Wed Oct 27 10:57:39 MDT 2010
I am using software called Galaxy to do high throughput sequence
analysis, and it has the option to submit jobs to a torque cluster. I
configured my galaxy server as a "submit host" in torque (qmgr -c "s s
submit_hosts += server_name"), and I installed the torque clients on
the galaxy server. Everything is working so far - Galaxy is
submitting nearly every task it does to my cluster as a torque job.
There is a catch though: my cluster uses an internal hostname as the
hostname portion of its job IDs. For example, my galaxy server
submits a job to the pbs_server at externalname.mydomain and gets an
ID back in the form job_num.scyld.localdoman. Now, I can't figure out
a way to be able to DELETE a job from this external submit host. If I
just do a qdel job_num (omit the host), it contacts the default server
(externalname.mydomain) but returns an error
"job_num.externalname.doman not found" (because the job id is really
job_num.scyld.localdomain). I can't do something like "qdel
job_num.scyld" because that won't resolve on the external submit host.
How do other people deal with this? I suppose I could have one of the
sysadmins screw around with the networking and TORQUE config so that
job IDs are in the form "job_num.externalname.doman". Are there any
other options? I hacked qdel on the external submit host so it would
connect to the server using the external name but still pass
"job_num.scyld.localdomain as the job ID but that doesn't fix this for
Galaxy (it uses pbs_python). I don't know for sure if Galaxy actually
tries to qdel a TORQUE job if a Galaxy user cancels a task, but if it
does it would be nice if it could do it successfully.
I think that newer versions of TORQUE have a "job_suffix_alias"
setting that might do what I want. If this does what it sounds like,
I could set it to "externalname.domain" and my job ids would have the
form job_num.externalname.domain. Is anyone using this feature? This
would require a TORQUE upgrade for me (we're still running 2.3).
More information about the torqueusers