[torqueusers] Torque configuration issie

Jo De Troy jo.de.troy at gmail.com
Tue Jan 10 06:12:51 MST 2006


Hello,

I'm running Torque/Maui on a small cluster (1 headnode + 5 dual CPU nodes)
running RHEL 3 and I have a few problems.
Apparantly when a user submits  a bunch of jobs in a row the  ones submitted
last go in to Queued state and soon afterwards they disappear.
When looking at these jobs with tracejob  the have an exit_status = -2
Is this a setting that limits the total number of jobs submitted by one
user? Or is something else wrong?

Another problem I have is that the jobs that run fine complain via e-mail
about being unable to copy the OU and the ER file from the spool directory
on the clusternode back to the homedirectory of the user who submitted the
job.
The headnode is NFS exporting the /home to all compute nodes, the headnode
is dual-homed (2 NICS)
The /home is mounted via the internal NIC while the error states it's trying
to copy the ER and OU files via the external NIC.

Can anybody point me in the right direction?

Thanks in advance,
Jo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060110/8939d0c7/attachment.html


More information about the torqueusers mailing list