[torqueusers] Empty output/error log file
Michael Krause
grid-admin at mpib-berlin.mpg.de
Fri Mar 25 04:06:20 MDT 2011
Hello Francois,
> So all the jobs go in the queue, the first 8th ones directly runs
> while the others are queued. The first 8th jobs end well and the
> wanted data files/results are generated.
Okay, so everything is fine with torque. Your problem is directly on
your node(s).
> The problem is random. If I re-run the jobs that previously failed
> (using the .csh script or manually) it will work for some of them and
> others will fail again.
>
> _Nothing_ is done/generated! There is obviously no problem of space;
> we have plenty of room in the scratch directory.
So if you are using 8 jobs and each of this job creates about 1M files
you might run into filesystem limitations, possibly not enough inodes.
Have you checked this? What operating system/filesystem/partition size
are you using?
As mentioned already, please also post the log files of your pbs_mom
directory on your node.
> My feeling is that when a core is freed because a PBS script ends
> well, its hard drive might remains busy because of other jobs running.
> Thus the hard drive might not be available for the next coming jobs....
Hard Drive Access is buffered and asynchronous. If your disk(s) are busy
it should only take a long time for all operations. They should
definitely not fail because something is "busy".
> I hope I was more precise in this new email.
Yes, no I have a better understanding of your problem.
--
Michael - MPIB Berlin
More information about the torqueusers
mailing list