[torqueusers] Not Running - PBS Error: Premature end of message

Garrick Staples garrick at usc.edu
Mon Nov 12 13:12:28 MST 2007


On Wed, Nov 07, 2007 at 08:53:43PM -0500, Samir Khanal alleged:
> Hi
> 
> I submitted a job using QSUB on the PBS
> 
> But the message below says that  "Not Running - PBS Error: Premature end of message"
> 
> I restarted the server, mom and sched but still i cannot pull it out.
> 
> I tried qsig -n SIGNULL jobid , qsig -s SIGKILL jobid, but no success.
> 
> I even KILLED the pbs_mom on the defective nodes, started it again and tried the qsig again, but it is still there.
> 
> Strangely when i do qdel 23848.bwp4 the prompt doesnot return, as if waiting for some input ? Has anyone come across this probelm?
> 
> This job seems to be stuck there forever.
> 
> ----------------------------------------------------------------------------------------------------------------------------
> Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
> --------------- -------- -------- ---------- ------ --- --- ------ ----- - -------------------------------------------------
> 23848.bwp4. skhanal  parallel       my_paralle    --        5  --    --             02:40 R   --
>  
>   node13/0+node12/0+node11/0+node10/0+node09/0
> 
>    Not Running - PBS Error: Premature end of message
> ----------------------------------------------------------------------------------------------------------------------------
> 


When jobs get stuck, restarting daemons just breaks things more.  When qdel and
qsig fail, clean up any processes on the node, 'momctl -c $jobid -h $node' and
'qdel -p $jobid.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20071112/586c8cac/attachment.bin


More information about the torqueusers mailing list