Bug 165 - qstat -a reports wrong 0 value in TSK column if nodes is a hostname
: qstat -a reports wrong 0 value in TSK column if nodes is a hostname
Status: NEW
Product: TORQUE
clients
: 4.1.*
: All All
: P3 normal
Assigned To: Ken Nielson
:
:
:
  Show dependency treegraph
 
Reported: 2011-12-10 10:06 MST by Stephane Rouberol
Modified: 2013-02-19 02:34 MST (History)
3 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Stephane Rouberol 2011-12-10 10:06:01 MST
cat ppn2_TSK0.batch
#!/bin/sh
#PBS -S /bin/sh
#PBS -l nodes=horizon11:ppn=2
#PBS -N Npp
#PBS -j oe
sleep 22m

qsub ppn2_TSK0.batch
246.horizon

horizon: ~/torque_tests > qstat -a

horizon.iap.fr: 
                                                                         Req'd 
Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory
Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------
----- - -----
246.horizon          rouberol batch    Npp                4445     1   0    -- 
04:00 R   -- 


TSK value is 0 instead of 2

The problem comes from torque-3.0.3/src/cmds/qstat.c code, line 709 
in 3.0.3 version:

int nodes = atoi(pat->value);

This returns 0 if pat->value does not begin with a number, like "horizon11" in
the job script example above.

The qstat.c code should distinguish between the 2 possibilities indicated 
in http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml:
nodes={<node_count> | <hostname>} to get an accurate value of TSK in case
of nodes=<hostname> use.

Regards,
sr
Comment 1 Chris Samuel 2011-12-12 16:29:12 MST
Slightly different issue in 2.4.x (yes, I know it's old, but it works!)..

[root@bruce-m ~]# qstat -u samuel -a

bruce-m.vlsci.unimelb.edu.au: 
                                                                         Req'd 
Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory
Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------
----- - -----
979069.bruce-m.v     samuel   batch    STDIN             29325     1 bru    -- 
01:00 C 00:00
979070.bruce-m.v     samuel   batch    STDIN             10876     1   1    -- 
01:00 R   -- 

The first job requested a specific node, but instead of being converted to a
number it was just passed through truncated at 3 characters.  Don't know if
that's better or worse than the behaviour in 3.0.x. :)
Comment 2 Laurent Facq 2013-02-19 02:34:50 MST
Bug 165 - qstat -a reports wrong 0 value in TSK column if nodes is a hostname

This bug still present in torque 4.1.4

qsbu ...  -l nodes=mynode001:ppn=10   => bug TSK=0
qsub ...  -l nodes=1:ppn=10           => ok  TSK=10 (but different in meaning
of course)

Thx.