[torqueusers] how to browse the stdout and stderr files of a running batch job

David Beer dbeer at adaptivecomputing.com
Tue Nov 6 08:45:51 MST 2012


This can definitely be done with those versions, although I wonder about
having $nospool_dir_list set and $spool_as_final_name set at the same time.
There is no reason for to set both of them.

$nospool_dir_list says if the output file is in any of the specified
directories or their subdirectories (I think it includes subdirs) then
instead spool to the user's home directory.

$spool_as_final_name takes the path to the output file and writes directly
to that path. I believe this overrides having $nospool_dir_list set, but
I'm not positive. At any rate, you don't want both of them set, you want
one of them set.

David

On Tue, Nov 6, 2012 at 5:54 AM, Christoph (Stucki) von Stuckrad <
stucki at mi.fu-berlin.de> wrote:

> On Fri, 02 Nov 2012, Nick Ihli wrote:
> > You can check out $spool_as_final_name and $nospool_dir_list in the mom
> config file.
> > You can read more about that in Appendix C in the torque docs.
>
> I step in here, because I did this, and normally it works as supposed
> to, but ...
>
> We installed an NFS server, and all cluster-nodes mount the same subtree
> from it (via fstab during startup). Therein is a directory
> /data/scratch/tmp
> where 'mom' should create the job-subdirectory and files. Normally this
> works and the users can 'tail' those files on the shared tree.
>
> BUT every so often, may be only once after starting mom (not sure yet),
> each node kills at least one job with 'permission denied' while trying
> to create the job's directory.  Seemingly this happenes eighter
> randomly or once after start of mom (or after start of node).
> (There is NO automount involved; that would crash nearly all jobs)
>
> Does somebody know of special setups needed to make 'mom' nfs-aware or
> is there a known way to avoid this?
> Is there a minimum version needed (higher than ours) to run mom 'on nfs'?
>
> Our 'momctl -q version' gives:
> localhost:      version = 'version=2.4.16
>
> RELEVANT CONFIG is:
> #################################################
> # use shared filesystem directly for spooling
> $nospool_dir_list /home,/data/scratch
> # AND spool directly to the user-defined name
> $spool_as_final_name true
>
> # use local cp for /home and scratch path
> $usecp *:/home  /home
> $usecp *:/data/scratch  /data/scratch
>
> # per-job tmpdir (job-id appended)
> $tmpdir         /data/scratch/tmp
> #################################################
>
> Yours   Stucki (cluster admin beginner)
>
> --
> Christoph von Stuckrad      * * |nickname |Mail <stucki at mi.fu-berlin.de> \
> Freie Universitaet Berlin   |/_*|'stucki' |Tel(Mo.,Mi.):+49 30 838-75 459|
> Mathematik & Informatik EDV |\ *|if online|  (Di,Do,Fr):+49 30 77 39 6600|
> Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(home):   +49 30 77 39 6601/
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121106/4c2457f2/attachment.html 


More information about the torqueusers mailing list