[torqueusers] failed on unknown node with status 1 ???
knielson at clusterresources.com
Fri May 22 10:10:04 MDT 2009
>i observe the following error in my logs:
>2009/05/21 21:42:43 INFO  PbsClient::wait_for_jobs_to_complete (427): Job 11035147.myHost:myJob:date failed on >unknown node with status 1
>2009/05/21 21:42:43 INFO  PbsClient::wait_for_jobs_to_complete (448): Resubmitting job myJob (failures: 2, limit: 200)
>i don't find any clues in any of the logs which could relate to this error and resubmitting the job often went through fine.
>can some one help me what does that mean by "failed on unknown node with status 1" ???>
I believe this error is generated because the job substate is set to JOB_SUBSTATE_EXITING. I'm not sure of all the implications but even though it looks from this state the job should be done it may be in transition and as it changes you are able to resubmit the job and have it work.
torqueusers mailing list
torqueusers at supercluster.org
More information about the torqueusers