[Mauiusers] Job is in 'Q' but checkjob shows it is running (!)

Mahmood Naderan nt_mahmood at yahoo.com
Mon Sep 12 10:27:58 MDT 2011


>Do you mean why isn't the job running, even though it seems that it *should* be running?

Exactly...

>If so, I would say post the output of qstat -f for the job, and checkjob -v

mahmood at srv1:~$ qstat -f 49153
Job Id: 49153.srv1
    Job_Name = bwaves
    Job_Owner = mahmood at srv1
    job_state = Q
    queue = Long
    server = srv1
    Checkpoint = u
    ctime = Mon Sep 12 19:55:29 2011
    Error_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwave
        s/bwaves.e49153
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Mon Sep 12 19:55:29 2011
    Output_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwav
        es/bwaves_128.out
    Priority = 0
    qtime = Mon Sep 12 19:55:29 2011
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = node2
    Resource_List.walltime = 960:00:00
    Variable_List = PBS_O_QUEUE=Long,PBS_O_HOME=/home/mahmood,
        ...
    etime = Mon Sep 12 19:55:29 2011
    submit_args = tor
    fault_tolerant = False

mahmood at srv1:~$ checkjob -v 49153
checking job 49153 (RM job '49153.srv1')

State: Idle
Creds:  user:mahmood  group:mahmood  class:Long  qos:DEFAULT
WallTime: 00:00:00 of 40:00:00:00
SubmitTime: Mon Sep 12 19:55:29
  (Time Queued  Total: 00:39:24  Eligible: 00:39:24)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1
NodeAccess: SHARED
NodeCount: 0


IWD: [NONE]  Executable:  [NONE]
Bypass: 3  StartCount: 0
PartitionMask: [ALL]
Flags:       HOSTLIST RESTARTABLE
HostList:
  [node2:1]
PE:  1.00  StartPriority:  147
job can run in partition DEFAULT (8 procs available.  1 procs required)


>which you seem to have manually selected in your qsub statement

Yes, As you can see I requested node2
Resource_List.nodes = node2

and the output of "pbsnodes -l all" shows that this node is free

mahmood at srv1:~$ pbsnodes -l all
srv1                  job-exclusive
node2                 free
node3                 job-exclusive
node4                 free


Any idea about that?

// Naderan *Mahmood;


----- Original Message -----
From: Steve Crusan <scrusan at ur.rochester.edu>
To: Mahmood Naderan <nt_mahmood at yahoo.com>
Cc: maui <mauiusers at supercluster.org>
Sent: Monday, September 12, 2011 6:17 PM
Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote:

> 
> 
> Hi,
> I sent this email to torque mailing list but seems that it is related to maui. So I restate the problem here.
> 
> Can someone explain why the qstat shows a job in "Q" but checkjob says everything is normal?


Looking below, the job is queued in TORQUE, and idle in Maui (not running), so everything is normal.

Do you mean why isn't the job running, even though it seems that it *should* be running?

If so, I would say post the output of qstat -f for the job, and checkjob -v. This seems to be more or less a scheduler configuration, or possibly an issue with the node (which you seem to have manually selected in your qsub statement).



> 
> mahmood at srv1:416.gamess$ qstat 49003
> Job id                    Name             User            Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 49003.srv1                 gamess           mahmood                0 Q Long
> 
> 
> mahmood at srv1:416.gamess$ checkjob 49003
> checking job 49003
> 
> State: Idle
> Creds:  user:mahmood  group:mahmood  class:Long    qos:DEFAULT
> WallTime: 00:00:00 of 40:00:00:00
> SubmitTime: Sun Sep 11 09:51:26
>   (Time Queued  Total: 00:02:36  Eligible: 00:02:36)
> 
> Total Tasks: 1
> 
> Req[0]  TaskCount: 1  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> 
> 
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> Flags:       HOSTLIST RESTARTABLE
> HostList:
>   [hawk:1]
> PE:  1.00  StartPriority:  129
> job can run in partition DEFAULT (3 procs available.  1 procs required)
> 
> Thanks
> // Naderan *Mahmood;
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers

----------------------
Steve Crusan
System Administrator
Center for Research Computing
University of Rochester
https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9
Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g
m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/
T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/
OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/
dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0=
=QcC7
-----END PGP SIGNATURE-----



More information about the mauiusers mailing list