[torqueusers] jobs completing with processes still running

Steve Young chemadm at hamilton.edu
Wed May 7 09:11:36 MDT 2008


Hi Mike,
	If it were me I would try running the program on one of the nodes  
that isn't allocated. You could even create a system reservation so  
no jobs were to get scheduled on the node while you test the  
application. By running the program yourself you eliminate pbs/moab  
from the problem. Once you are certain the application works as  
expected then I would start using the queue system again to see if  
your problem gets introduced there. Hope this helps =).

-Steve



On May 6, 2008, at 11:49 AM, Michael Robbert wrote:

> I am a new Torque user so be gentle. We are running Moab 5.2.1 and  
> Torque 2.3.0 and we have a user that is submitting jobs (user  
> compiled CHARMM if that matters) and often their jobs are returning  
> within a few seconds and the only data in their Output/Error file  
> is "Done.". The job disappears from the queue as would be expected,  
> but the problem is that their code is still running on all cores of  
> all nodes that they started it on. I run "mdiag -n" and see these  
> nodes show up as idle but load is HIGH.
> Are there any ideas of what could cause this to happen? Should I be  
> looking at their code? We only have a few users so far, but so far  
> theirs is the only code doing this. What commands in Moab or Torque  
> should I be using to detect and solve this issue? So far it is just  
> mdiag and communication with the user. Specifically how can I find  
> out the jobid of a job that has completed given that I know what  
> nodes it was running on? And then how can I peer into the guts of  
> that job to find out what it was doing?
> I don't expect to get an answer to the problem, but hope I can find  
> out how to research it. I have searched the list archives and have  
> been trying to read as much documentation as I can, but I'm still  
> stumped. I just signed up for MoabCon so hopefully I'll be an  
> expert after that.
>
> Thanks for any suggestions,
> Mike Robbert
> Colorado School of Mines
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list