[torqueusers] Job stuck at E state forever

Abhishek Gupta abhig at Princeton.EDU
Tue Mar 31 00:08:22 MDT 2009


Hi Halvor,
Some time ago I had the same problem and some people come up with the 
problem saying that its because of rcp implementation in PBS which can 
be changed, but I don't know how to change it tp cp or scp which can 
probably solve the issue. If you have any idea about it, please let me know.
Thanks,
Abhi.

Halvor Utby wrote:
> Abhishek Gupta wrote:
>> Hi all,
>> Some of the running jobs after running, reaches the E state and got 
>> stuck there forever. Could someone tell me the reason for that and 
>> how to solve this problem?
>
> Hi,
>
> Do a "pbsnodes -l" and see if the nodes running the "E jobs" are 
> unavailable/down. I would guess they are, and as soon as you have 
> started pbs on these nodes, the jobs will disappear from your queue.
>
> "qdel -p jobnumber" will also purge the job from your queue, but 
> should only be used if the node can not be made available again.
>


More information about the torqueusers mailing list