[torqueusers] torque 4.2.6 server Undefined attribute

Rick McKay rmckay at adaptivecomputing.com
Wed Nov 20 15:01:38 MST 2013


Eva,

That's a defect. As soon as you upgrade your MOMs to 4.2.6, they'll start
jobs. It's marked for correction in 4.2.7. Here's the changeset hash:

345daa2..e3fb235 HEAD -> 4.2-dev

Rick McKay | Technical Support Engineer
Adaptive Computing



On Wed, Nov 20, 2013 at 2:54 PM, Eva Hocks <hocks at sdsc.edu> wrote:

>
>
> torque server 4.2.6 cannot start jobs on moms running 4.2.5 due to
> Undefined attribute  (15002) in send_job_work? Is this an expected
> behavior?
>
>
>
> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;set_nodes;allocating nodes
> for job 206074.mskcc-fe1.local with node expression 'gpu-1-4:ppn=10'
> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;set_nodes;job
> 206074.mskcc-fe1.local allocated 1 nodes
> (nodelist=gpu-1-4/0+gpu-1-4/1+gpu-1-4/2+gpu-1-4/3+gpu-1-4/4+gpu-1-4/5+gpu-1-4/6+gpu-1-4/7+gpu-1-4/8+gpu-1-4/9)
> 11/20/2013 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;Job
> Run at request of root at mskcc-fe1.local
> 11/20/2013
> 13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate: setting
> job 206074.mskcc-fe1.local state from QUEUED-QUEUED to RUNNING-PRERUN (4-40)
> 11/20/2013 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;send
> of job to gpu-1-4 failed error = 15002
> 11/20/2013
> 13:42:07;0001;PBS_Server.17972;Svr;PBS_Server;LOG_ERROR::Undefined
> attribute  (15002) in send_job_work, child failed in previous commit
> request for job 206074.mskcc-fe1.local
> 11/20/2013
> 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;entering
> finish_sendmom
> 11/20/2013 13:42:07;0002;PBS_Server.17972;Job;206074.mskcc-fe1.local;child
> reported failure for job after 0 seconds (dest=???), rc=-1
> 11/20/2013
> 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;unable to run
> job, MOM rejected/rc=-1
> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;free_nodes;freeing nodes for
> job 206074.mskcc-fe1.local
> 11/20/2013
> 13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate: setting
> job 206074.mskcc-fe1.local state from RUNNING-TRNOUT to QUEUED-QUEUED (1-10)
> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;free_nodes;freeing nodes for
> job 206074.mskcc-fe1.local
> 11/20/2013
> 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;unable to run
> job, send to MOM '183508983' failed
> 11/20/2013
> 13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate: setting
> job 206074.mskcc-fe1.local state from QUEUED-QUEUED to QUEUED-QUEUED (1-10)
>
>
>
> Thanks
> Eva
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131120/a16bee3c/attachment.html 


More information about the torqueusers mailing list