[torqueusers] torque 4.2.6 server Undefined attribute

Ken Nielson knielson at adaptivecomputing.com
Wed Nov 20 15:37:40 MST 2013


If you are doing rolling upgrades keep your pbs_server at the earlier
version and upgrade the pbs_moms to 4.2.6. After all of your MOMs have
upgraded then move pbs_server to 4.2.6.

This has been fixed and will be available in all upcoming TORQUE releases.

Regards


On Wed, Nov 20, 2013 at 3:01 PM, Rick McKay <rmckay at adaptivecomputing.com>wrote:

> Eva,
>
> That's a defect. As soon as you upgrade your MOMs to 4.2.6, they'll start
> jobs. It's marked for correction in 4.2.7. Here's the changeset hash:
>
> 345daa2..e3fb235 HEAD -> 4.2-dev
>
> Rick McKay | Technical Support Engineer
> Adaptive Computing
>
>
>
> On Wed, Nov 20, 2013 at 2:54 PM, Eva Hocks <hocks at sdsc.edu> wrote:
>
>>
>>
>> torque server 4.2.6 cannot start jobs on moms running 4.2.5 due to
>> Undefined attribute  (15002) in send_job_work? Is this an expected
>> behavior?
>>
>>
>>
>> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;set_nodes;allocating nodes
>> for job 206074.mskcc-fe1.local with node expression 'gpu-1-4:ppn=10'
>> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;set_nodes;job
>> 206074.mskcc-fe1.local allocated 1 nodes
>> (nodelist=gpu-1-4/0+gpu-1-4/1+gpu-1-4/2+gpu-1-4/3+gpu-1-4/4+gpu-1-4/5+gpu-1-4/6+gpu-1-4/7+gpu-1-4/8+gpu-1-4/9)
>> 11/20/2013 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;Job
>> Run at request of root at mskcc-fe1.local
>> 11/20/2013
>> 13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate: setting
>> job 206074.mskcc-fe1.local state from QUEUED-QUEUED to RUNNING-PRERUN (4-40)
>> 11/20/2013 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;send
>> of job to gpu-1-4 failed error = 15002
>> 11/20/2013
>> 13:42:07;0001;PBS_Server.17972;Svr;PBS_Server;LOG_ERROR::Undefined
>> attribute  (15002) in send_job_work, child failed in previous commit
>> request for job 206074.mskcc-fe1.local
>> 11/20/2013
>> 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;entering
>> finish_sendmom
>> 11/20/2013
>> 13:42:07;0002;PBS_Server.17972;Job;206074.mskcc-fe1.local;child reported
>> failure for job after 0 seconds (dest=???), rc=-1
>> 11/20/2013
>> 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;unable to run
>> job, MOM rejected/rc=-1
>> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;free_nodes;freeing nodes
>> for job 206074.mskcc-fe1.local
>> 11/20/2013
>> 13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate: setting
>> job 206074.mskcc-fe1.local state from RUNNING-TRNOUT to QUEUED-QUEUED (1-10)
>> 11/20/2013 13:42:07;0040;PBS_Server.17972;Req;free_nodes;freeing nodes
>> for job 206074.mskcc-fe1.local
>> 11/20/2013
>> 13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;unable to run
>> job, send to MOM '183508983' failed
>> 11/20/2013
>> 13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate: setting
>> job 206074.mskcc-fe1.local state from QUEUED-QUEUED to QUEUED-QUEUED (1-10)
>>
>>
>>
>> Thanks
>> Eva
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131120/1b24e6eb/attachment-0001.html 


More information about the torqueusers mailing list