[torqueusers] Node allocation problem with -l nodes=1:ppn=3+1:ppn=2
Mike Coyne
Mike.Coyne at PACCAR.com
Fri Feb 19 13:59:07 MST 2010
In working with the gssapi version of torque, I believe I ran into a
non-gssapi issue?
I have problem with either maui rewriting the nodes incorrectly, it
would appear that on submittion
Between the time when the job gets queued and is run the node_spec gets
truncated to the last ":ppn=xxx" read
As well as the nodesfile for the job. Is truncated starting at the last
ppn... read. the exec_nodes seem to be truncated. Does maui set the
rq_runjob->rq_destin
I am wondeing if I am looking in the right place?
Has anyone see similar response ?
>From Maui I have
02/19 11:28:01 INFO: end of list reached. 2 nodes found
02/19 11:28:01 INFO: tasks distributed: 5 (Round-Robin)
02/19 11:28:01 MAMAllocJReserve(153,RIndex,ErrMsg)
02/19 11:28:01 MRMJobStart(153,Msg,SC)
02/19 11:28:01 MPBSJobStart(153,STYX.PBDENTON.PACCAR.COM:16101,Msg,SC)
02/19 11:28:02 INFO: job '153' successfully started
02/19 11:28:02 MQueueAddAJob(153)
02/19 11:28:02 MStatUpdateActiveJobUsage(153)
02/19 11:28:02
MPolicyAdjustUsage(NULL,153,NULL,active,NULL,[ALL],1,NULL)
02/19 11:28:02
MPolicyAdjustUsage(NULL,153,NULL,active,NULL,[ALL],1,NULL)
02/19 11:28:02 INFO: job '153' added to MAQ at slot 1
02/19 11:28:02 INFO: MAQ: [2 : 153 : 0][1 : 139 : 8627772]
02/19 11:28:02 MResJCreate(153,MNodeList,00:00:00,ActiveJob,Res)
02/19 11:28:02 MResAddNode(153,styxvm1.pbdenton.paccar.com,3,0)
02/19 11:28:02 MResAddNode(153,dante.pbdenton.paccar.com,2,0)
02/19 11:28:02 MResAdjustDRes(153,FALSE)
02/19 11:28:02 MPolicyAdjustUsage(NULL,153,NULL,idle,PU,[ALL],-1,NULL)
02/19 11:28:02 MPolicyAdjustUsage(NULL,153,NULL,idle,NULL,[ALL],-1,NULL)
02/19 11:28:02 MParUpdate(DEFAULT)
02/19 11:28:02 INFO: P[DEFAULT]: Total 4:12 Up 4:12 Idle 3:12
Active 1:4
02/19 11:28:02 INFO: MNode[dantevm1.pbdenton.paccar.com] added to
MPar[DEFAULT] (4:4)
02/19 11:28:02 INFO: MNode[dante.pbdenton.paccar.com] added to
MPar[DEFAULT] (0:2)
02/19 11:28:02 INFO: MNode[styx.pbdenton.paccar.com] added to
MPar[DEFAULT] (2:2)
02/19 11:28:02 INFO: MNode[styxvm1.pbdenton.paccar.com] added to
MPar[DEFAULT] (1:4)
02/19 11:28:02 INFO: P[DEFAULT]: Total 4:12 Up 4:12 Idle 1:7
Active 3:9
02/19 11:28:02 MJobAddToNL(153,NULL)
02/19 11:28:02 INFO: node styxvm1.pbdenton.paccar.com added to job
153. PSlot: [M 4:4]
02/19 11:28:02 INFO: node dante.pbdenton.paccar.com added to job
153. PSlot: [bque 2:2]
02/19 11:28:02 INFO: starting job '153'
02/19 11:28:02 INFO: 1 jobs started on iteration 66
And from torque (gssapi)
02/19/2010 11:27:46;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate:
setting job 153.styx.pbdenton.paccar.com state from TRANSIT-TRANSICM to
QUEUED-QUEUED (1-10)
02/19/2010
11:27:46;0100;PBS_Server;Job;153.styx.pbdenton.paccar.com;enqueuing into
feed, state 1 hop 1
02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;entered
spec=1:ppn=3+1:ppn=2
02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;job allocation debug:
2 requested, 12 svr_clnodes, 4 svr_totnodes
02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;job allocation
debug(2): 2 requested, 4 svr_numnodes
02/19/2010 11:27:46;0040;PBS_Server;Req;node_spec;job allocation
debug(3): returning 2 requested
02/19/2010
11:27:46;0100;PBS_Server;Job;153.styx.pbdenton.paccar.com;dequeuing from
feed, state QUEUED
02/19/2010
11:27:46;0100;PBS_Server;Job;153.styx.pbdenton.paccar.com;enqueuing into
dque, state 1 hop 1
02/19/2010 11:27:46;0008;PBS_Server;Job;reply_send;Reply sent for
request type Commit on socket 14
02/19/2010
11:27:46;0008;PBS_Server;Job;153.styx.pbdenton.paccar.com;Reply sent for
request type Commit on socket 14
02/19/2010 11:27:47;0080;PBS_Server;Req;dis_request_read;decoding
command Disconnect from mcoyne
02/19/2010 11:27:53;0004;PBS_Server;Svr;svr_connect;attempting connect
to host 160.69.126.121 port 16102
02/19/2010
11:27:53;0008;PBS_Server;Job;139.styx.pbdenton.paccar.com;attr
resources_used modified
02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command StatusNode from root
02/19/2010 11:28:01;0100;PBS_Server;Req;;Type StatusNode request
received from root at styx.pbdenton.paccar.com, sock=9
02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request StatusNode on sd=9
02/19/2010 11:28:01;0040;PBS_Server;Req;req_stat_node;entered
02/19/2010 11:28:01;0008;PBS_Server;Job;reply_send;Reply sent for
request type StatusNode on socket 9
02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command StatusQueue from root
02/19/2010 11:28:01;0100;PBS_Server;Req;;Type StatusQueue request
received from root at styx.pbdenton.paccar.com, sock=9
02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request StatusQueue on sd=9
02/19/2010 11:28:01;0008;PBS_Server;Job;reply_send;Reply sent for
request type StatusQueue on socket 9
02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command StatusJob from root
02/19/2010 11:28:01;0100;PBS_Server;Req;;Type StatusJob request received
from root at styx.pbdenton.paccar.com, sock=9
02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request StatusJob on sd=9
02/19/2010 11:28:01;0008;PBS_Server;Job;reply_send;Reply sent for
request type StatusJob on socket 9
02/19/2010 11:28:01;0080;PBS_Server;Req;dis_request_read;decoding
command RunJob from root
02/19/2010 11:28:01;0100;PBS_Server;Req;;Type RunJob request received
from root at styx.pbdenton.paccar.com, sock=9
02/19/2010 11:28:01;0008;PBS_Server;Job;dispatch_request;dispatching
request RunJob on sd=9
02/19/2010 11:28:01;0040;PBS_Server;Req;set_nodes;allocating nodes for
job 153.styx.pbdenton.paccar.com with node expression
'dante.pbdenton.paccar.com:ppn=2'
02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;entered
spec=dante.pbdenton.paccar.com:ppn=2
02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;job allocation debug:
1 requested, 12 svr_clnodes, 4 svr_totnodes
02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;job allocation
debug(2): 1 requested, 4 svr_numnodes
02/19/2010 11:28:01;0040;PBS_Server;Req;node_spec;job allocation
debug(3): returning 1 requested
02/19/2010 11:28:01;0040;PBS_Server;Req;add_job_to_node;allocated node
dante.pbdenton.paccar.com/0 to job 153.styx.pbdenton.paccar.com
(nsnfree=2)
02/19/2010 11:28:01;0040;PBS_Server;Req;add_job_to_node;allocated node
dante.pbdenton.paccar.com/1 to job 153.styx.pbdenton.paccar.com
(nsnfree=1)
02/19/2010 11:28:01;0040;PBS_Server;Req;set_nodes;job
153.styx.pbdenton.paccar.com allocated 2 nodes
(nodelist=dante.pbdenton.paccar.com/1+dante.pbdenton.paccar.com/0)
02/19/2010 11:28:01;0008;PBS_Server;Jo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100219/a1dd5e83/attachment-0001.html
More information about the torqueusers
mailing list