[torqueusers] failed on unknown node with status 1 ???

Ken Nielson knielson at clusterresources.com
Fri May 22 10:10:04 MDT 2009



>hi all,
>
>i observe the following error in my logs:
>-->
>2009/05/21 21:42:43 INFO  [3790] PbsClient::wait_for_jobs_to_complete (427): Job 11035147.myHost:myJob:date failed on >unknown node with status 1
>2009/05/21 21:42:43 INFO  [3790] PbsClient::wait_for_jobs_to_complete (448): Resubmitting job myJob (failures: 2, limit: 200)
><--
>
>i don't find any clues in any of the logs which could relate to this error and resubmitting the job often went through fine.
>
>can some one help me what does that mean by "failed on unknown node with status 1" ???>
>

balu,

I believe this error is generated because the job substate is set to JOB_SUBSTATE_EXITING. I'm not sure of all the implications but even though it looks from this state the job should be done it may be in transition and as it changes you are able to resubmit the job and have it work.

Ken Nielson
Cluster Resources
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list