[torqueusers] success or failure
michael young
mhyoung at valdosta.edu
Tue Mar 13 17:11:47 MDT 2007
Here is the output on the master node from 'tracejob -v 321'
############START####################
/var/spool/PBS//sched_logs/20070313: No such file or directory
Job: 321.cluster.chemistry.valdosta.edu
03/13/2007 17:41:30 S enqueuing into default, state 1 hop 1
03/13/2007 17:41:30 S Job Queued at request of
spartan at cluster.chemistry.valdosta.edu, owner =
spartan at cluster.chemistry.valdosta.edu, job
name = STDIN, queue = default
03/13/2007 17:41:30 A queue=default
03/13/2007 17:41:32 S Job Modified at request of
root at cluster.chemistry.valdosta.edu
03/13/2007 17:41:32 S Job Run at request of
root at cluster.chemistry.valdosta.edu
03/13/2007 17:41:32 A user=spartan group=spartan jobname=STDIN
queue=default ctime=1173825690 qtime=1173825690
etime=1173825690 start=1173825692
exec_host=he12/0 Resource_List.neednodes=he12
03/13/2007 17:42:02 S Exit_status=0 resources_used.cput=00:00:00
resources_used.mem=3196kb resources_used.vmem=135104kb
resources_used.walltime=00:00:30
03/13/2007 17:42:02 A user=spartan group=spartan jobname=STDIN
queue=default ctime=1173825690 qtime=1173825690
etime=1173825690 start=1173825692
exec_host=he12/0 Resource_List.neednodes=he12 session=14160
end=1173825722 Exit_status=0
resources_used.cput=00:00:00 resources_used.mem=3196kb
resources_used.vmem=135104kb
resources_used.walltime=00:00:30
03/13/2007 17:42:25 S Post job file processing error
03/13/2007 17:42:25 S dequeuing from default, state EXITIN
################END#####################
Her is the output for the same command on the slave node assigned the job
###############START#############
/var/spool/PBS//server_priv/accounting/20070313: No such file or directory
/var/spool/PBS//server_logs/20070313: No such file or directory
/var/spool/PBS//sched_logs/20070313: No such file or directory
Job: 321.cluster.chemistry.valdosta.edu
03/13/2007 17:18:12 M scan_for_terminated: job
321.cluster.chemistry.valdosta.edu task 1 terminated, sid 14160
03/13/2007 17:18:12 M Terminated
#################END###################
The first shows a post job error.
the second just says termanated.
I looked in root and spartan's home directory and saw no files created
by torque.
Any ideas as to what could be happening or what else I could look for clues?
Michael
John Hanks wrote:
>If no output files show up the most common cause of that for me has been
>that the node was unable to copy the files back to the home directory. Check
>the torque logs on the master and on the node that ran the job. Somewhere it
>should tell you what happened when it tried to copy to spooled output back
>to the users home directory.
>
>jbh
>
>
>On 3/13/07 3:21 PM, "michael young" <mhyoung at valdosta.edu> wrote:
>
>
>
>>Thanks for the reply John.
>>I did 'echo "sleep 10; echo success" | qsub', as you instructed
>>then, 'qstat', which gave this.
>>Job id Name User Time Use S Queue
>>------------------- ---------------- ---------------- -------- - -----
>>320.cluster STDIN spartan 0 R default
>>
>>then I did 'ls STDIN*', and it said "ls: STDIN*: No such file or directory"
>>
>>Is it suppose to create a file in a curtain directory? Like maybe I'm
>>in the wrong directory.
>>
>>
>>thanks,
>>Michael
>>
>>
>>John Hanks wrote:
>>
>>
>>
>>>Here's what I did to test this:
>>>
>>>
>>>griznog at uinta ~ $ echo "sleep 10; echo success" | qsub
>>>527191.uinta.hpc.usu.edu
>>>griznog at uinta ~ $ qstat 527191
>>>Job id Name User Time Use S Queue
>>>------------------- ---------------- ---------------- -------- - -----
>>>527191.uinta STDIN griznog 0 R serial
>>>griznog at uinta ~ $ ls STDIN*
>>>STDIN.e527191 STDIN.o527191
>>>griznog at uinta ~ $ cat STDIN.o527191
>>>success
>>>griznog at uinta ~ $
>>>
>>>adding 'echo success' to the submitted command forces an output file to be
>>>created. Hope that helps.
>>>
>>>jbh
>>>
>>>On 3/13/07 2:18 PM, "michael young" <mhyoung at valdosta.edu> wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>>I do 'echo "sleep 30" | qsub'.
>>>>How do I know if it succeeded or failed?
>>>>
>>>>thank you,
>>>>Michael
>>>>_______________________________________________
>>>>torqueusers mailing list
>>>>torqueusers at supercluster.org
>>>>http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>
>
>
>
More information about the torqueusers
mailing list