[torqueusers] [newbie] {problems with, alternatives to} 'tail log'

Gustavo Correa gus at ldeo.columbia.edu
Sun Nov 17 11:16:16 MST 2013


Oops.  Correction:

$spool_as_final_name true


On Nov 17, 2013, at 1:11 PM, Gustavo Correa wrote:

> Hi Tom
> 
> If you want stdout/stderr written directly to the work directory, you can add:
> 
> $spool_as_final_name
> 
> to the $TORQUE/mom_priv/config files on your compute nodes.
> 
> See the Torque Admin Guide:
> 
> http://docs.adaptivecomputing.com/torque/help.htm#topics/12-appendices/parameters.htm%3FTocPath%3DAppendices|Appendix%20C%3A%20Node%20manager%20%28MOM%29%20configuration|_____1
> 
> Beware this may add some impact on your networked file system IO.
> See this recent discussion thread in the list archives:
> 
> http://www.supercluster.org/pipermail/torqueusers/2013-October/016352.html
> 
> If you want only the output of a particular executable or script inside the submitted job,
> you can redirect stdout of that part with something like "> $PBS_O_WORKDIR/program.log".
> This can be a bit tricky with MPI programs, though.
> Again, this may tax NFS to some extent.
> 
> Besides the above, if you are testing/debugging a program,
> you can submit an interactive job, instead of batch.
> That will put you in the compute node,
> with access to the program stdout/stderr:
> 
> qsub -I ...
> 
> See 'man qsub' for more details about the -I (*this is capital letter i, not lowercase L*) 
> qsub switch.
> 
> For a well tested/debugged program I guess there is little need to check log/stdout/stderr 
> files while job is running.
> Hence, I prefer the latter solution, while keeping the stdout/stderr files in the 
> local compute node spool directory until the job ends.  
> 
> My two cents,
> Gus Correa
> 
> 
> On Nov 11, 2013, at 11:19 PM, Tom Roche wrote:
> 
>> 
>> How to make Torque flush stderr/stdout to file before end-of-job? Can I do this as a user, or would I need privileges? Or is this just not doable, and I should try Something Completely Different? What I mean, why I ask:
>> 
>> I'm a new PBS/Torque user, relatively new to scientific computing, but a linux/unix user for many years, and not GUI-dependent. I'm accustomed to monitoring progress by
>> 
>> 1. redirecting (or `tee`ing) stderr/stdout to one or more logfiles
>> 2. `tail`ing the resulting log(s)
>> 
>> so am annoyed that my Torque jobs only write log after the job stops (whether ab- or normally). I can get some progress feedback from examining other output, but not with the detail (much less the `grep`ability) I can get from the logfile. Hence I'd greatly prefer to be able to flush Torque more often ... but I'm not finding much information about this problem, beyond one SO post
>> 
>> http://stackoverflow.com/questions/10527061/pbs-refresh-stdout
>> 
>> which suggests that I'd need admin on Torque. Is that correct? (I'm "just a user," but I could probably get an admin to make required change(s) if I could present them with clear directions (and $20 :-) and performance was not noticeably degraded.) If not,
>> 
>> 1. Are there alternatives I can pursue as "just a user"? E.g., I notice in the above
>> 
>> http://stackoverflow.com/questions/10527061/pbs-refresh-stdout
>>> I ended up capturing stdout outside the queue
>> 
>> but am not sure what is meant, much less how that would be done.
>> 
>> 2. Is this just not feasible? Should I be focusing my effort on, e.g., better parsing my output to determine progress?
>> 
>> Apologies if the above is a FAQ, but I didn't see it @
>> 
>> http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/11-troubleshooting/faq.htm
>> 
>> and am having trouble googling this topic (given the real-world overloading of the terms 'PBS' and 'Torque'). Pointers to relevant docs and other information sources esp appreciated.
>> 
>> FWIW, my job has two parts: a Torque-aware, `qsub`ing, "outer" bash script (written by me) that wraps a previously-written (not by me), unmanaged/serial-native, "inner" csh script. The outer script sets and tests a bazillion environment variables, before delivering payload like
>> 
>> QSUB_ARG="-V -q ${queue_name} -N ${job_name} -l nodes=${N}:ppn=${PPN},walltime=${WT} -m ${mail_opts} -j ${join_opts} -o ${path_to_logfile_capturing_inner-script_stdout}"
>> for CMD in \
>> "ls -alh ${path_to_outer-script_logfile}" \
>> "ls -alh ${path_to_logfile_capturing_inner-script_stdout}" \
>> "find ${path_to_output_directory}/ -type f | wc -l" \
>> "du -hs ${path_to_output_directory}/" \
>> "ls -alt ${path_to_output_directory}/" \
>> "qsub ${QSUB_ARG} ${path_to_inner_script}" \
>> ; do
>> echo -e "$ ${CMD}" 2>&1 | tee -a "${path_to_outer-script_logfile}"
>> eval "${CMD}" 2>&1 | tee -a "${path_to_outer-script_logfile}"
>> done
>> 
>> after which I want to be able to do `tail ${path_to_logfile_capturing_inner-script_stdout}` , but can't.
>> 
>> your assistance is appreciated, Tom Roche <Tom_Roche at pobox.com>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list