[torqueusers] Node allocation problem with -l nodes=1:ppn=3+1:ppn=2

Mike Coyne Mike.Coyne at PACCAR.com
Fri Feb 19 13:59:07 MST 2010


In working with the gssapi version of torque, I believe I ran into a
non-gssapi issue?

I have problem  with either maui rewriting the nodes incorrectly, it
would appear that on submittion  

 

Between the time when the job gets queued and is run the node_spec gets
truncated to the last ":ppn=xxx"  read 

As well as the nodesfile for the job. Is truncated starting at the last
ppn... read. the exec_nodes seem to be truncated. Does maui set the
rq_runjob->rq_destin

 

I am wondeing if I am looking in the right place?

Has anyone see similar response ?

 

>From Maui I have

02/19 11:28:01 INFO:     end of list reached.  2 nodes found

02/19 11:28:01 INFO:     tasks distributed: 5 (Round-Robin)

02/19 11:28:01 MAMAllocJReserve(153,RIndex,ErrMsg)

02/19 11:28:01 MRMJobStart(153,Msg,SC)

02/19 11:28:01 MPBSJobStart(153,STYX.PBDENTON.PACCAR.COM:16101,Msg,SC)

02/19 11:28:02 INFO:     job '153' successfully started

02/19 11:28:02 MQueueAddAJob(153)

02/19 11:28:02 MStatUpdateActiveJobUsage(153)

02/19 11:28:02
MPolicyAdjustUsage(NULL,153,NULL,active,NULL,[ALL],1,NULL)

02/19 11:28:02
MPolicyAdjustUsage(NULL,153,NULL,active,NULL,[ALL],1,NULL)

02/19 11:28:02 INFO:     job '153' added to MAQ at slot 1

02/19 11:28:02 INFO:     MAQ: [2 : 153 : 0][1 : 139 : 8627772]

02/19 11:28:02 MResJCreate(153,MNodeList,00:00:00,ActiveJob,Res)

02/19 11:28:02 MResAddNode(153,styxvm1.pbdenton.paccar.com,3,0)

02/19 11:28:02 MResAddNode(153,dante.pbdenton.paccar.com,2,0)

02/19 11:28:02 MResAdjustDRes(153,FALSE)

02/19 11:28:02 MPolicyAdjustUsage(NULL,153,NULL,idle,PU,[ALL],-1,NULL)

02/19 11:28:02 MPolicyAdjustUsage(NULL,153,NULL,idle,NULL,[ALL],-1,NULL)

02/19 11:28:02 MParUpdate(DEFAULT)

02/19 11:28:02 INFO:     P[DEFAULT]:  Total 4:12  Up 4:12  Idle 3:12
Active 1:4

02/19 11:28:02 INFO:     MNode[dantevm1.pbdenton.paccar.com] added to
MPar[DEFAULT] (4:4)

02/19 11:28:02 INFO:     MNode[dante.pbdenton.paccar.com] added to
MPar[DEFAULT] (0:2)

02/19 11:28:02 INFO:     MNode[styx.pbdenton.paccar.com] added to
MPar[DEFAULT] (2:2)

02/19 11:28:02 INFO:     MNode[styxvm1.pbdenton.paccar.com] added to
MPar[DEFAULT] (1:4)

02/19 11:28:02 INFO:     P[DEFAULT]:  Total 4:12  Up 4:12  Idle 1:7
Active 3:9

02/19 11:28:02 MJobAddToNL(153,NULL)

02/19 11:28:02 INFO:     node styxvm1.pbdenton.paccar.com added to job
153.  PSlot: [M 4:4]

02/19 11:28:02 INFO:     node dante.pbdenton.paccar.com added to job
153.  PSlot: [bque 2:2]

02/19 11:28:02 INFO:     starting job '153'

02/19 11:28:02 INFO:     1 jobs started on iteration 66

 

And from torque (gssapi) 

 

 

02/19/2010 11:27:46;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate:
setting job 153.styx.pbdenton.paccar.com state from TRANSIT-TRANSICM to
QUEUED-QUEUED (1-10)

02/19/2010
11:27:46;0100;PBS_Server;Job;153.styx.pbdenton.paccar.com;enqueuing into
feed, state 1 hop 1

02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;entered
spec=1:ppn=3+1:ppn=2

02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;job allocation debug:
2 requested, 12 svr_clnodes, 4 svr_totnodes

02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;job allocation
debug(2): 2 requested, 4 svr_numnodes

02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;job allocation
debug(3): returning 2 requested

02/19/2010
11:27:46;0100;PBS_Server;Job;153.styx.pbdenton.paccar.com;dequeuing from
feed, state QUEUED

02/19/2010
11:27:46;0100;PBS_Server;Job;153.styx.pbdenton.paccar.com;enqueuing into
dque, state 1 hop 1

02/19/2010 11:27:46;0008;PBS_Server;Job;reply_send;Reply sent for
request type Commit on socket 14

02/19/2010
11:27:46;0008;PBS_Server;Job;153.styx.pbdenton.paccar.com;Reply sent for
request type Commit on socket 14

02/19/2010 11:27:47;0080;PBS_Server;Req;dis_request_read;decoding
command Disconnect from mcoyne

02/19/2010 11:27:53;0004;PBS_Server;Svr;svr_connect;attempting connect
to host 160.69.126.121 port 16102

02/19/2010
11:27:53;0008;PBS_Server;Job;139.styx.pbdenton.paccar.com;attr
resources_used modified

02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command StatusNode from root

02/19/2010 11:28:01;0100;PBS_Server;Req;;Type StatusNode request
received from root at styx.pbdenton.paccar.com, sock=9

02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request StatusNode on sd=9

02/19/2010 11:28:01;0040;PBS_Server;Req;req_stat_node;entered

02/19/2010 11:28:01;0008;PBS_Server;Job;reply_send;Reply sent for
request type StatusNode on socket 9

02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command StatusQueue from root

02/19/2010 11:28:01;0100;PBS_Server;Req;;Type StatusQueue request
received from root at styx.pbdenton.paccar.com, sock=9

02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request StatusQueue on sd=9

02/19/2010 11:28:01;0008;PBS_Server;Job;reply_send;Reply sent for
request type StatusQueue on socket 9

02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command StatusJob from root

02/19/2010 11:28:01;0100;PBS_Server;Req;;Type StatusJob request received
from root at styx.pbdenton.paccar.com, sock=9

02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request StatusJob on sd=9

02/19/2010 11:28:01;0008;PBS_Server;Job;reply_send;Reply sent for
request type StatusJob on socket 9

02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command RunJob from root

02/19/2010 11:28:01;0100;PBS_Server;Req;;Type RunJob request received
from root at styx.pbdenton.paccar.com, sock=9

02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request RunJob on sd=9

02/19/2010 11:28:01;0040;PBS_Server;Req;set_nodes;allocating nodes for
job 153.styx.pbdenton.paccar.com with node expression
'dante.pbdenton.paccar.com:ppn=2'

02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;entered
spec=dante.pbdenton.paccar.com:ppn=2

02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;job allocation debug:
1 requested, 12 svr_clnodes, 4 svr_totnodes

02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;job allocation
debug(2): 1 requested, 4 svr_numnodes

02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;job allocation
debug(3): returning 1 requested

02/19/2010 11:28:01;0040;PBS_Server;Req;add_job_to_node;allocated node
dante.pbdenton.paccar.com/0 to job 153.styx.pbdenton.paccar.com
(nsnfree=2)

02/19/2010 11:28:01;0040;PBS_Server;Req;add_job_to_node;allocated node
dante.pbdenton.paccar.com/1 to job 153.styx.pbdenton.paccar.com
(nsnfree=1)

02/19/2010 11:28:01;0040;PBS_Server;Req;set_nodes;job
153.styx.pbdenton.paccar.com allocated 2 nodes
(nodelist=dante.pbdenton.paccar.com/1+dante.pbdenton.paccar.com/0)

02/19/2010 11:28:01;0008;PBS_Server;Jo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100219/a1dd5e83/attachment-0001.html 


More information about the torqueusers mailing list