[torqueusers] how to browse the stdout and stderr files of a running batch job

Christoph (Stucki) von Stuckrad stucki at mi.fu-berlin.de
Tue Nov 6 05:54:01 MST 2012


On Fri, 02 Nov 2012, Nick Ihli wrote:
> You can check out $spool_as_final_name and $nospool_dir_list in the mom config file.
> You can read more about that in Appendix C in the torque docs.

I step in here, because I did this, and normally it works as supposed
to, but ...

We installed an NFS server, and all cluster-nodes mount the same subtree
from it (via fstab during startup). Therein is a directory /data/scratch/tmp
where 'mom' should create the job-subdirectory and files. Normally this
works and the users can 'tail' those files on the shared tree.

BUT every so often, may be only once after starting mom (not sure yet),
each node kills at least one job with 'permission denied' while trying
to create the job's directory.  Seemingly this happenes eighter
randomly or once after start of mom (or after start of node).
(There is NO automount involved; that would crash nearly all jobs)

Does somebody know of special setups needed to make 'mom' nfs-aware or
is there a known way to avoid this?
Is there a minimum version needed (higher than ours) to run mom 'on nfs'?

Our 'momctl -q version' gives:
localhost:      version = 'version=2.4.16

RELEVANT CONFIG is:
#################################################
# use shared filesystem directly for spooling
$nospool_dir_list /home,/data/scratch
# AND spool directly to the user-defined name
$spool_as_final_name true

# use local cp for /home and scratch path
$usecp *:/home  /home
$usecp *:/data/scratch  /data/scratch

# per-job tmpdir (job-id appended)
$tmpdir         /data/scratch/tmp
#################################################

Yours   Stucki (cluster admin beginner)

-- 
Christoph von Stuckrad      * * |nickname |Mail <stucki at mi.fu-berlin.de> \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(Mo.,Mi.):+49 30 838-75 459|
Mathematik & Informatik EDV |\ *|if online|  (Di,Do,Fr):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(home):   +49 30 77 39 6601/


More information about the torqueusers mailing list