[torqueusers] Q:Torque Job Submission

Prakash Velayutham prakash.velayutham at cchmc.org
Tue Sep 2 22:04:02 MDT 2008


* Are the host keys already in the known_hosts file?
* Are you able to ssh between the 2 nodes without giving a password?

Prakash

On Sep 2, 2008, at 10:43 PM, Ye YC Cui wrote:

> Hi all
> If we run command : qstat -f
> We can get log as follow:
> Job Id: 216.M_02
> Job_Name = STDIN
> Job_Owner = user1 at M_01
> resources_used.cput = 00:00:00
> resources_used.mem = 0kb
> resources_used.vmem = 0kb
> resources_used.walltime = 00:00:00
> job_state = C
> queue = batch
> server = M_02
> Checkpoint = u
> ctime = Tue Sep 2 05:20:50 2008
> Error_Path = M_01:/home/user1/STDIN.e216
> exec_host = M_02/0
> Hold_Types = n
> Join_Path = n
> Keep_Files = n
> Mail_Points = a
> mtime = Tue Sep 2 05:20:59 2008
> Output_Path = M_01:/home/user1/STDIN.o216
> Priority = 0
> qtime = Tue Sep 2 05:20:50 2008
> Rerunable = True
> Resource_List.nodect = 1
> Resource_List.nodes = M_02
> Resource_List.walltime = 01:00:00
> session_id = 4848
> Variable_List = PBS_O_HOME=/home/user1,PBS_O_LANG=en_US.UTF-8,
> PBS_O_LOGNAME=user1,
> PBS_O_PATH=/usr/mpi/gcc/mvapich2-1.0.3/bin:/usr/kerberos/bin:/opt/csm
> /bin:/usr/local/bin:/bin:/usr/bin:/var/spool/torque:/home/user1/bin,
> PBS_O_MAIL=/var/spool/mail/user1,PBS_O_SHELL=/bin/bash,
> PBS_SERVER=M_01,PBS_O_HOST=M_01,PBS_O_WORKDIR=/home/user1,
> PBS_O_QUEUE=batch
> sched_hint = Post job file processing error; job 216.M_02 on host  
> gaia-
> 08/0
>
> Unable to copy file /var/spool/torque/spool/216.M_02.OU to ad
> min1 at M_01:/home/user1/STDIN.o216
> >>> error from copy
> Host key veri
> fication failed.
> lost connection
> >>> end error output
> Output retained
> on that host in: /var/spool/torque/undelivered/216.M_02.OU
>
> Unable
> to copy file /var/spool/torque/spool/216.M_02.ER to user1 at M_01
> :/home/user1/STDIN.e216
> >>> error from copy
> Host key verification fai
> led.
> lost connection
> >>> end error output
> Output retained on that hos
> t in: /var/spool/torque/undelivered/216.M_02.ER
> comment = Job started on Tue Sep 02 at 05:20
> etime = Tue Sep 2 05:20:50 2008
> exit_status = 0
> submit_args = -l nodes=M_02
> start_time = Tue Sep 2 05:20:50 2008
> start_count = 1
>
> In my opinion :
>
> To copy file from node1 to node2 is unavailable.
> Job submitted user1 do not have right to copy between node1 and  
> node1.(I have do ./configure --with-scp)
>
>
> <graycol.gif>Ye YC Cui---09/02/2008 10:54:38 PM---Hi all,
>
>
> Hi all,
> As we know the PBS batch file may be specified as a filename on the  
> qsub command line or may be entered via STDIN.
> For example :
> STDIN.e.100
> STDIN.o.100
> But when we use node1 to submit a job and order node2 to execute the  
> job,
> we can not find STDIN.* files.
> Could you tell me it is right or wrong?
>
> Simon Cui ( 崔野)
> IBM China Software Development LAB, Beijing
> Tel: 86-10-82782244 ext 54955 E-mail: cuiye at cn.ibm.com
> Address: 2/F, DeShi Building, No.9, East Road, ShangDi, Beijing  
> 100085, P.R.China
> MSN: cuiye_forevery at hotmail.com
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080903/bd9da97f/attachment-0001.html


More information about the torqueusers mailing list