[torqueusers] Qstat reporting false node use

Clevenger, Kevin KClevenger at coh.org
Wed Apr 11 12:38:56 MDT 2007


Hi,

Whene running multiple NAMD jobs on the cluster (Rocks 4.2.1) we see qstat -n report that the jobs start on separate nodes, but when you look at the processes with cluster-ps they in fact are not. Anyone know why this is and how to straigten it out? Output below.

Thanks

Kevin

###################################################

$ qstat -n

cluster.coh.org: 
                                                                   Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
153.cluster.coh.org      bob     longrun  eq32.submi   5042     8   1    --  1000: R 00:23
   c-0-24+c-0-24+c-0-23+c-0-23+c-0-22+c-0-22+c-0-21+c-0-21+c-0-20+c-0-20+c-0-19
   +c-0-19+c-0-18+c-0-18+c-0-17+c-0-17
154.cluster.coh.org      bob     longrun  eq08.submi  32618     4   1    --  1000: R 00:22
   c-0-16+c-0-16+c-0-15+c-0-15+c-0-14+c-0-14+c-0-13+c-0-13
155.cluster.coh.org      bob     longrun  TAK779-eq0   1383     4   1    --  1000: R 00:18
   c-0-12+c-0-12+c-0-11+c-0-11+c-0-10+c-0-10+c-0-9+c-0-9

~~~~~~~~~~~~~~~~~~~~

$ cluster-ps vaidsimpl
c-0-0: 
bob      5148 47.2  9.4 215432 195080 ?     R    11:03  12:14 /home/bob/vaidsimpl /home/bob/STAT3 eq32.namd
bob      5168 40.8  5.4 126060 112676 ?     R    11:03  10:34 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      5207 26.4  2.1 56656 44536 ?       R    11:04   6:42 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      5216 31.5  3.6 90228 74756 ?       S    11:04   7:59 /home/bob/vaidsimpl /home/bob/CCR2APO/MD eq08-con.namd
bob      5287 29.6  3.5 88252 72392 ?       R    11:08   6:17 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD eq08-con.namd
bob      5295 27.0  2.2 57572 45428 ?       S    11:08   5:43 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-1: 
bob      4307 40.8  5.5 127340 113164 ?     R    11:03  10:35 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      4313 38.9  5.3 123800 110232 ?     R    11:03  10:06 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      4357 29.8  2.3 61648 49316 ?       S    11:04   7:35 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4361 29.6  2.3 61464 49196 ?       S    11:04   7:31 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4427 24.8  2.2 57520 45452 ?       S    11:08   5:16 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
bob      4439 28.5  2.4 61528 50088 ?       R    11:08   6:03 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-2: 
bob      3449 45.0  5.8 135184 120840 ?     S    11:03  11:42 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      3450 45.6  5.8 135752 121192 ?     R    11:03  11:51 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      3495 30.2  2.4 63072 50484 ?       S    11:04   7:41 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      3499 30.0  2.4 62080 49692 ?       S    11:04   7:38 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      3572 26.1  2.3 59872 48448 ?       R    11:08   5:32 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
bob      3576 26.1  2.3 58600 47340 ?       S    11:08   5:33 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-3: 
bob      4699 44.5  5.7 132996 118492 ?     S    11:03  11:35 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      4718 46.3  5.7 131752 117528 ?     S    11:03  12:03 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      4763 29.2  2.2 58268 46084 ?       R    11:04   7:26 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4767 29.4  2.4 61616 49428 ?       S    11:04   7:30 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4841 28.3  2.3 60092 47744 ?       R    11:08   6:01 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
bob      4845 25.4  2.1 57012 44872 ?       R    11:08   5:25 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-4: 
bob      4077 45.1  5.8 135304 120804 ?     S    11:03  11:46 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      4081 44.9  5.8 134500 120244 ?     S    11:03  11:44 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      4126 29.2  2.4 62180 49728 ?       S    11:04   7:27 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4130 30.2  2.3 61008 48688 ?       R    11:04   7:43 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4194 24.1  2.2 57740 45532 ?       S    11:08   5:09 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
bob      4208 27.2  2.3 61304 48748 ?       R    11:08   5:49 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-5: 
bob      3971 42.6  5.5 128356 114548 ?     R    11:03  11:08 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      3991 42.8  5.6 130088 116216 ?     R    11:03  11:11 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      4036 30.8  2.4 62748 50392 ?       R    11:04   7:53 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4040 29.9  2.4 62376 49828 ?       R    11:04   7:39 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      4108 26.1  2.2 59388 47088 ?       R    11:08   5:34 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
bob      4118 26.7  2.3 61724 49184 ?       R    11:08   5:41 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-6: 
bob      3881 46.5  5.6 130016 115668 ?     S    11:03  12:11 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      3885 43.2  5.3 124064 110516 ?     S    11:03  11:19 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      3913 29.5  2.3 60320 47860 ?       R    11:04   7:35 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      3933 28.2  2.1 57148 44992 ?       S    11:04   7:15 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      3999 27.2  2.3 61412 48792 ?       S    11:08   5:51 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
bob      4011 26.0  2.1 56988 44844 ?       S    11:08   5:34 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-7: 
bob      3789 46.1  5.8 134716 121036 ?     R    11:03  12:05 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      3792 45.7  5.8 134084 119676 ?     S    11:03  11:58 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd
bob      3837 30.1  2.4 62072 49784 ?       R    11:04   7:43 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      3841 24.3  2.1 55472 43388 ?       R    11:04   6:14 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
bob      3903 26.8  2.3 60792 48452 ?       R    11:08   5:45 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
bob      3919 27.9  2.3 61240 48856 ?       R    11:08   5:58 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-8: 
c-0-9: 
c-0-10: 
c-0-11: 
c-0-12: 
bob      1414  0.0  0.0  5848  764 ?        S    11:08   0:00 /home/bob/vaidsim ++remote-shell ssh ++nodelist /share/data/etc/nodelist +p16 /home/bob/vaidsimpl /home/bob/CCR2TAK779/MD/eq08-con.namd
c-0-13: 
c-0-14: 
c-0-15: 
c-0-16: 
bob     32649  0.0  0.0  5848  764 ?        S    11:04   0:00 /home/bob/vaidsim ++remote-shell ssh ++nodelist /share/data/etc/nodelist +p16 /home/bob/vaidsimpl /home/bob/CCR2APO/MD/eq08-con.namd
c-0-17: 
c-0-18: 
c-0-19: 
c-0-20: 
c-0-21: 
c-0-22: 
c-0-23: 
c-0-24: 
bob      5069  0.0  0.0  5848  764 ?        S    11:03   0:00 /home/bob/vaidsim ++remote-shell ssh ++nodelist /share/data/etc/nodelist +p16 /home/bob/vaidsimpl /home/bob/STAT3/eq32.namd


"EMF <COH.org>" made the following annotations.
------------------------------------------------------------------------------

SECURITY/CONFIDENTIALITY WARNING:  This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. 
==============================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070411/bdcd4fb9/attachment-0001.html


More information about the torqueusers mailing list