[Mauiusers] Job is in 'Q' but checkjob shows it is running (!)
Mahmood Naderan
nt_mahmood at yahoo.com
Mon Sep 12 10:27:58 MDT 2011
>Do you mean why isn't the job running, even though it seems that it *should* be running?
Exactly...
>If so, I would say post the output of qstat -f for the job, and checkjob -v
mahmood at srv1:~$ qstat -f 49153
Job Id: 49153.srv1
Job_Name = bwaves
Job_Owner = mahmood at srv1
job_state = Q
queue = Long
server = srv1
Checkpoint = u
ctime = Mon Sep 12 19:55:29 2011
Error_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwave
s/bwaves.e49153
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = a
mtime = Mon Sep 12 19:55:29 2011
Output_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwav
es/bwaves_128.out
Priority = 0
qtime = Mon Sep 12 19:55:29 2011
Rerunable = True
Resource_List.nodect = 1
Resource_List.nodes = node2
Resource_List.walltime = 960:00:00
Variable_List = PBS_O_QUEUE=Long,PBS_O_HOME=/home/mahmood,
...
etime = Mon Sep 12 19:55:29 2011
submit_args = tor
fault_tolerant = False
mahmood at srv1:~$ checkjob -v 49153
checking job 49153 (RM job '49153.srv1')
State: Idle
Creds: user:mahmood group:mahmood class:Long qos:DEFAULT
WallTime: 00:00:00 of 40:00:00:00
SubmitTime: Mon Sep 12 19:55:29
(Time Queued Total: 00:39:24 Eligible: 00:39:24)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1
NodeAccess: SHARED
NodeCount: 0
IWD: [NONE] Executable: [NONE]
Bypass: 3 StartCount: 0
PartitionMask: [ALL]
Flags: HOSTLIST RESTARTABLE
HostList:
[node2:1]
PE: 1.00 StartPriority: 147
job can run in partition DEFAULT (8 procs available. 1 procs required)
>which you seem to have manually selected in your qsub statement
Yes, As you can see I requested node2
Resource_List.nodes = node2
and the output of "pbsnodes -l all" shows that this node is free
mahmood at srv1:~$ pbsnodes -l all
srv1 job-exclusive
node2 free
node3 job-exclusive
node4 free
Any idea about that?
// Naderan *Mahmood;
----- Original Message -----
From: Steve Crusan <scrusan at ur.rochester.edu>
To: Mahmood Naderan <nt_mahmood at yahoo.com>
Cc: maui <mauiusers at supercluster.org>
Sent: Monday, September 12, 2011 6:17 PM
Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote:
>
>
> Hi,
> I sent this email to torque mailing list but seems that it is related to maui. So I restate the problem here.
>
> Can someone explain why the qstat shows a job in "Q" but checkjob says everything is normal?
Looking below, the job is queued in TORQUE, and idle in Maui (not running), so everything is normal.
Do you mean why isn't the job running, even though it seems that it *should* be running?
If so, I would say post the output of qstat -f for the job, and checkjob -v. This seems to be more or less a scheduler configuration, or possibly an issue with the node (which you seem to have manually selected in your qsub statement).
>
> mahmood at srv1:416.gamess$ qstat 49003
> Job id Name User Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 49003.srv1 gamess mahmood 0 Q Long
>
>
> mahmood at srv1:416.gamess$ checkjob 49003
> checking job 49003
>
> State: Idle
> Creds: user:mahmood group:mahmood class:Long qos:DEFAULT
> WallTime: 00:00:00 of 40:00:00:00
> SubmitTime: Sun Sep 11 09:51:26
> (Time Queued Total: 00:02:36 Eligible: 00:02:36)
>
> Total Tasks: 1
>
> Req[0] TaskCount: 1 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [NONE]
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 0
> PartitionMask: [ALL]
> Flags: HOSTLIST RESTARTABLE
> HostList:
> [hawk:1]
> PE: 1.00 StartPriority: 129
> job can run in partition DEFAULT (3 procs available. 1 procs required)
>
> Thanks
> // Naderan *Mahmood;
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
----------------------
Steve Crusan
System Administrator
Center for Research Computing
University of Rochester
https://www.crc.rochester.edu/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9
Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g
m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/
T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/
OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/
dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0=
=QcC7
-----END PGP SIGNATURE-----
More information about the mauiusers
mailing list