[torqueusers] success or failure

michael young mhyoung at valdosta.edu
Tue Mar 13 17:11:47 MDT 2007


Here is the output on the master node from 'tracejob -v 321'
############START####################
/var/spool/PBS//sched_logs/20070313: No such file or directory

Job: 321.cluster.chemistry.valdosta.edu

03/13/2007 17:41:30  S    enqueuing into default, state 1 hop 1
03/13/2007 17:41:30  S    Job Queued at request of 
spartan at cluster.chemistry.valdosta.edu, owner =
                          spartan at cluster.chemistry.valdosta.edu, job 
name = STDIN, queue = default
03/13/2007 17:41:30  A    queue=default
03/13/2007 17:41:32  S    Job Modified at request of 
root at cluster.chemistry.valdosta.edu
03/13/2007 17:41:32  S    Job Run at request of 
root at cluster.chemistry.valdosta.edu
03/13/2007 17:41:32  A    user=spartan group=spartan jobname=STDIN 
queue=default ctime=1173825690 qtime=1173825690
                          etime=1173825690 start=1173825692 
exec_host=he12/0 Resource_List.neednodes=he12
03/13/2007 17:42:02  S    Exit_status=0 resources_used.cput=00:00:00 
resources_used.mem=3196kb resources_used.vmem=135104kb
                          resources_used.walltime=00:00:30
03/13/2007 17:42:02  A    user=spartan group=spartan jobname=STDIN 
queue=default ctime=1173825690 qtime=1173825690
                          etime=1173825690 start=1173825692 
exec_host=he12/0 Resource_List.neednodes=he12 session=14160
                          end=1173825722 Exit_status=0 
resources_used.cput=00:00:00 resources_used.mem=3196kb
                          resources_used.vmem=135104kb 
resources_used.walltime=00:00:30
03/13/2007 17:42:25  S    Post job file processing error
03/13/2007 17:42:25  S    dequeuing from default, state EXITIN
################END#####################

Her is the output for the same command on the slave node assigned the job
###############START#############

/var/spool/PBS//server_priv/accounting/20070313: No such file or directory
/var/spool/PBS//server_logs/20070313: No such file or directory
/var/spool/PBS//sched_logs/20070313: No such file or directory

Job: 321.cluster.chemistry.valdosta.edu

03/13/2007 17:18:12  M    scan_for_terminated: job 
321.cluster.chemistry.valdosta.edu task 1 terminated, sid 14160
03/13/2007 17:18:12  M    Terminated
#################END###################

The first shows a post job error.
the second just says termanated.
I looked in root and spartan's home directory and saw no files created 
by torque.

Any ideas as to what could be happening or what else I could look for clues?

Michael



John Hanks wrote:

>If no output files show up the most common cause of that for me has been
>that the node was unable to copy the files back to the home directory. Check
>the torque logs on the master and on the node that ran the job. Somewhere it
>should tell you what happened when it tried to copy to spooled output back
>to the users home directory.
>
>jbh
>
>
>On 3/13/07 3:21 PM, "michael young" <mhyoung at valdosta.edu> wrote:
>
>  
>
>>Thanks for the reply John.
>>I did 'echo "sleep 10; echo success" | qsub', as you instructed
>>then, 'qstat', which gave this.
>>Job id              Name             User             Time Use S Queue
>>------------------- ---------------- ---------------- -------- - -----
>>320.cluster         STDIN            spartan                 0 R default
>>
>>then I did 'ls STDIN*', and it said "ls: STDIN*: No such file or directory"
>>
>>Is it suppose to create a file in a curtain directory?  Like maybe I'm
>>in the wrong directory.
>>
>>
>>thanks,
>>Michael
>>
>>
>>John Hanks wrote:
>>
>>    
>>
>>>Here's what I did to test this:
>>>
>>>
>>>griznog at uinta ~ $ echo "sleep 10; echo success" | qsub
>>>527191.uinta.hpc.usu.edu
>>>griznog at uinta ~ $ qstat 527191
>>>Job id              Name             User             Time Use S Queue
>>>------------------- ---------------- ---------------- -------- - -----
>>>527191.uinta        STDIN            griznog                 0 R serial
>>>griznog at uinta ~ $ ls STDIN*
>>>STDIN.e527191  STDIN.o527191
>>>griznog at uinta ~ $ cat STDIN.o527191
>>>success
>>>griznog at uinta ~ $
>>>
>>>adding 'echo success' to the submitted command forces an output file to be
>>>created. Hope that helps.
>>>
>>>jbh
>>>
>>>On 3/13/07 2:18 PM, "michael young" <mhyoung at valdosta.edu> wrote:
>>>
>>>
>>>
>>>      
>>>
>>>>Hi,
>>>>
>>>>I do 'echo "sleep 30" | qsub'.
>>>>How do I know if it succeeded or failed?
>>>>
>>>>thank you,
>>>>Michael
>>>>_______________________________________________
>>>>torqueusers mailing list
>>>>torqueusers at supercluster.org
>>>>http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>>>
>>
>
>
>  
>


More information about the torqueusers mailing list