[torqueusers] why not have error file and o file when job is finished

Gus Correa gus at ldeo.columbia.edu
Tue Dec 2 11:01:56 MST 2008


Hi Zhyang and list

On different occasions, with different versions of PBS,
different PBS scripts, different computers and clusters,
different NFS, local disk, etc,
I found the *.o and *.e files on:

1) The work directory $PBS_O_WORKDIR

2) The user home directory.
When the script doesn't cd to $PBS_O_WORKDIR.
The home dir may be on the master node or the compute node, if there are 
multiple home directories
on each node.

3) On the "Mother Superior" node in:

$PBS_HOME/spool

or in:

$PBS_HOME/undelivered

This indicates a problem and the files are still named *.ER and *.OU.
Not necessarily a job failure, maybe a glitch in NFS, or something else.

$PBS_HOME is wherever Torque/PBS is installed.
The "Mother Superior" is the first node on the $PBS_NODEFILE list of 
each job.

I hope this helps.
Gus Correa

---------------------------------------------------------------------
Gustavo Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Garrick wrote:

> Look in syslog on the node where the job executed, and in the email 
> that may have been sent to the user.
>
> HPCC/Linux Systems Admin
>
> On Dec 1, 2008, at 5:56 PM, zhyang at lzu.edu.cn wrote:
>
>> I saw the frontend syslog,pbs_server log, It's seem not information 
>> about this error. other account is all right. only one account run 
>> into this problem.
>>
>>
>>> "Garrick Staples" <garrick at usc.edu>
>>> 2008-12-02 09:56:32
>>> torqueusers at supercluster.org
>>>
>>> Re: [torqueusers] why not have error file and o file when job is 
>>> finished
>>> On Tue, Dec 02, 2008 at 09:24:07AM +0800, zhyang at lzu.edu.cn alleged:
>>>
>>>> Hi
>>>
>>>
>>>>
>>>
>>>> I recentlt found when my job finished,I have not any out file,such 
>>>> as job.e* or job.o*, I know that one normal finish job, torque will 
>>>> give two files,e file anf o file. who can give me some 
>>>> suggestiones? Thanks!
>>>
>>>
>>>>
>>>
>>>
>>>
>>> Look in the syslog of the node where your job ran.
>>>
>>>
>>>
>>> -- 
>>>
>>> Garrick Staples, GNU/Linux HPCC SysAdmin
>>>
>>> University of Southern California
>>>
>>>
>>>
>>> Revoke LDS Church 501(c)(3) Status - http://lds501c3.wordpress.com/
>>>
>>>
>>>
>>>
>>
>> -- 
>>
>>
>>
>> Lan Zhou University
>>
>> Email:zhyang at lzu.edu.cn
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueuser
> s




More information about the torqueusers mailing list