[torqueusers] job status C

Glen Beane beaneg at umcs.maine.edu
Thu Sep 9 09:48:01 MDT 2004


it doesn't seem to work 100% correctly.  I get errors in the C scheduler
log file that says the job is an unknown state,  and sometimes new jobs
won't start on the node (i'm just testing the torque setup with one
compute node, i'm going to distribute the mom files to the other 255
nodes shortly) until the old job leaves the C state.

below all jobs use 2 processors, and just run mpiexec --comm=none
hostname

They should all be done in a matter of secconds, but once a few jobs get
in the C state, the others sit in the Q state for a while.

bender:~ beaneg$ /exports/pbs/bin/qstat
Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
5.bender         qsubTest         beaneg                  0 C default 
6.bender         qsubTest         beaneg                  0 C default 
7.bender         qsubTest         beaneg                  0 Q default 
8.bender         qsubTest         beaneg                  0 Q default 
bender:~ beaneg$ /exports/pbs/bin/qstat
Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
5.bender         qsubTest         beaneg                  0 C default 
6.bender         qsubTest         beaneg                  0 C default 
7.bender         qsubTest         beaneg                  0 Q default 
8.bender         qsubTest         beaneg                  0 Q default 
bender:~ beaneg$ /exports/pbs/bin/qstat
Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
5.bender         qsubTest         beaneg                  0 C default 
6.bender         qsubTest         beaneg                  0 C default 
7.bender         qsubTest         beaneg                  0 Q default 
8.bender         qsubTest         beaneg                  0 Q default
bender:~ beaneg$ /exports/pbs/bin/qstat
Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
5.bender         qsubTest         beaneg                  0 C default 
6.bender         qsubTest         beaneg                  0 C default 
7.bender         qsubTest         beaneg                  0 Q default 
8.bender         qsubTest         beaneg                  0 Q default 


These jobs litterall sit queued for minutes when they should be running.

for my sake, can you tell me which files are modified so I can remove
this feature (for now - so I can make the phb happy by having our 256
apples working)?  Or do you need (or would you like) help debugging
this?

On Thu, 2004-09-09 at 11:30, Wightman wrote:
> This is a new feature that is in the snapshots and will be released with
> the next patch.  A status of C means complete.  The job will stick
> around for about 5 minutes and then disappear.  This allows Moab and
> Maui to better track the status of jobs.
> 
> Douglas
> Cluster Resources, INC.
> 
> On Thu, 2004-09-09 at 07:43, Glen Beane wrote:
> > What does a job status of C mean in qstat?  After my jobs finish and
> > copy their .e and .o files into the user home directory, they sit in the
> > queue with a status of C for a few minutes.  I couldn't find
> > documentation anywhere that described this status, and I don't see this
> > behavior on my Linux cluster. The cluster I see this on is running a
> > Torque snapshot from a week or so ago on OS X 10.3.5
> > 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list