[torqueusers] Jobs stuck in Q

Lippert, Kenneth B. Kenneth.Lippert at alcoa.com
Mon Sep 27 06:54:19 MDT 2010



I have a strange problem which I just cannot solve.


My torque server runs on a virtual machine (Bob80G) under Xen on SuSE
10.3.  I have two physical hosts (Big and Little) where I  can run
bob80G.  When I first constructed bobv80G I made a mistake and made it
larger than will fit on Little.   (Bob80G is an 80 Gbyte disk image).
Little has only 72GB disks.   I got Bob80G all built and set up as the
Torque server on the development machine Large.  It runs fine there.
Everyone can submit, jobs run, no problems.   Then when I went to move
Bob80G to the production server Little, I realized it would not fit.  So
I made a smaller Bob40G diskimage, and rebuilt.   I was very careful to
get all of the ethernet  and other identifications identical between
Bob80G and Bob40G.  I copied the entire /var/spool/torque directory from
Bob80G and installed it on Bob40G.


I shut down Bob80G on Large and start up Bob40G on Little.  Everything
appears to go as it should.  Submit hosts can submit and execute
"qstats" and "pbsnodes" commands, and that all works.  But no job ever
runs.   Submitted jobs sit in the queued state forever. 


I have examined the Torque and Maui logs but cannot see anything that
tells me why the jobs are stuck.


I just don't know where to look next.


Thank you for any pointers.


-kenn lippert



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100927/19174b66/attachment.html 

More information about the torqueusers mailing list