[torqueusers] torque 4.2.6 server Undefined attribute

Martin Siegert siegert at sfu.ca
Wed Nov 20 15:53:07 MST 2013


We ran into exactly the same problem: we needed to upgrade the server
rightaway because of the security hole (CVE-2013-4495). We then
planned rolling updates on all computenodes. As soon as we had
updated the server the moms on the computenodes died with segmentation
faults :-(
Because of the security hole updating the server last is not really
an option.

Cheers,
Martin

On Wed, Nov 20, 2013 at 03:37:40PM -0700, Ken Nielson wrote:
> 
>    If you are doing rolling upgrades keep your pbs_server at the earlier
>    version and upgrade the pbs_moms to 4.2.6. After all of your MOMs have
>    upgraded then move pbs_server to 4.2.6.
>    This has been fixed and will be available in all upcoming TORQUE
>    releases.
>    Regards
> 
>    On Wed, Nov 20, 2013 at 3:01 PM, Rick McKay
>    <[1]rmckay at adaptivecomputing.com> wrote:
> 
>    Eva,
>    That's a defect. As soon as you upgrade your MOMs to 4.2.6, they'll
>    start jobs. It's marked for correction in 4.2.7. Here's the changeset
>    hash:
>    345daa2..e3fb235 HEAD -> 4.2-dev
>    Rick McKay | Technical Support Engineer
>    Adaptive Computing
> 
>    On Wed, Nov 20, 2013 at 2:54 PM, Eva Hocks <[2]hocks at sdsc.edu> wrote:
> 
>      torque server 4.2.6 cannot start jobs on moms running 4.2.5 due to
>      Undefined attribute  (15002) in send_job_work? Is this an expected
>      behavior?
>      11/20/2013 13:42:07;0040;PBS_Server.17972;Req;set_nodes;allocating
>      nodes for job 206074.mskcc-fe1.local with node expression
>      'gpu-1-4:ppn=10'
>      11/20/2013 13:42:07;0040;PBS_Server.17972;Req;set_nodes;job
>      206074.mskcc-fe1.local allocated 1 nodes
>      (nodelist=gpu-1-4/0+gpu-1-4/1+gpu-1-4/2+gpu-1-4/3+gpu-1-4/4+gpu-1-4/
>      5+gpu-1-4/6+gpu-1-4/7+gpu-1-4/8+gpu-1-4/9)
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;Job Run at
>      request of root at mskcc-fe1.local
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate:
>      setting job 206074.mskcc-fe1.local state from QUEUED-QUEUED to
>      RUNNING-PRERUN (4-40)
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;send of
>      job to gpu-1-4 failed error = 15002
>      11/20/2013
>      13:42:07;0001;PBS_Server.17972;Svr;PBS_Server;LOG_ERROR::Undefined
>      attribute  (15002) in send_job_work, child failed in previous commit
>      request for job 206074.mskcc-fe1.local
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;entering
>      finish_sendmom
>      11/20/2013
>      13:42:07;0002;PBS_Server.17972;Job;206074.mskcc-fe1.local;child
>      reported failure for job after 0 seconds (dest=???), rc=-1
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;unable to
>      run job, MOM rejected/rc=-1
>      11/20/2013 13:42:07;0040;PBS_Server.17972;Req;free_nodes;freeing
>      nodes for job 206074.mskcc-fe1.local
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate:
>      setting job 206074.mskcc-fe1.local state from RUNNING-TRNOUT to
>      QUEUED-QUEUED (1-10)
>      11/20/2013 13:42:07;0040;PBS_Server.17972;Req;free_nodes;freeing
>      nodes for job 206074.mskcc-fe1.local
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;206074.mskcc-fe1.local;unable to
>      run job, send to MOM '183508983' failed
>      11/20/2013
>      13:42:07;0008;PBS_Server.17972;Job;svr_setjobstate;svr_setjobstate:
>      setting job 206074.mskcc-fe1.local state from QUEUED-QUEUED to
>      QUEUED-QUEUED (1-10)
>      Thanks
>      Eva
>      _______________________________________________
>      torqueusers mailing list
>      [3]torqueusers at supercluster.org
>      [4]http://www.supercluster.org/mailman/listinfo/torqueusers
> 
>      _______________________________________________
>      torqueusers mailing list
>      [5]torqueusers at supercluster.org
>      [6]http://www.supercluster.org/mailman/listinfo/torqueusers
> 
>    --
>    Ken Nielson
>    +1 801.717.3700 office +1 801.717.3738 fax
>    1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
>    [7]www.adaptivecomputing.com
> 
> References
> 
>    1. mailto:rmckay at adaptivecomputing.com
>    2. mailto:hocks at sdsc.edu
>    3. mailto:torqueusers at supercluster.org
>    4. http://www.supercluster.org/mailman/listinfo/torqueusers
>    5. mailto:torqueusers at supercluster.org
>    6. http://www.supercluster.org/mailman/listinfo/torqueusers
>    7. http://www.adaptivecomputing.com/

> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list