[torqueusers] Empty output/error log file

Michael Krause grid-admin at mpib-berlin.mpg.de
Fri Mar 25 04:06:20 MDT 2011


Hello Francois,

> So all the jobs go in the queue, the first 8th ones directly runs
> while the others are queued. The first 8th jobs end well and the
> wanted data files/results are generated.
Okay, so everything is fine with torque. Your problem is directly on 
your node(s).

> The problem is random. If I re-run the jobs that previously failed
> (using the .csh script or manually) it will work for some of them and
> others will fail again.
>
> _Nothing_ is done/generated! There is obviously no problem of space;
> we have plenty of room in the scratch directory.
So if you are using 8 jobs and each of this job creates about 1M files 
you might run into filesystem limitations, possibly not enough inodes.
Have you checked this? What operating system/filesystem/partition size 
are you using?

As mentioned already, please also post the log files of your pbs_mom 
directory on your node.


> My feeling is that when a core is freed because a PBS script ends
> well, its hard drive might remains busy because of other jobs running.
> Thus the hard drive might not be available for the next coming jobs....
Hard Drive Access is buffered and asynchronous. If your disk(s) are busy 
it should only take a long time for all operations. They should 
definitely not fail because something is "busy".

> I hope I was more precise in this new email.
Yes, no I have a better understanding of your problem.

-- 
Michael - MPIB Berlin


More information about the torqueusers mailing list