[torqueusers] searcing for the directory of running job

Mahmood Naderan nt_mahmood at yahoo.com
Fri Feb 1 00:06:30 MST 2013


>lsof -p <job pid> is very informative in this regard.
Yes thank you. I found it.

>So we warn our users NEVER to change a binary, while a job
>still runs it, and instead to install different variants
>in different places in that case.
Are you sure about that? I did that many times... The binary information is

m5.fast: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0x1f70368e109b221707860f84d7682e5bec2e3778, stripped


If the dynamic library (.so file) is modified then it is acceptable that the behavior of the executable may change. However if the .so files are unmodified and only the executable is been recompiled, I wouldn't expect different behavior. 


To discuss the reason, lets forget the Torque... On a normal linux operating system upon execution of the program, the kernel will copy the image of the executable to the memory. At this point, if you delete the executable, you won't see an error message that the binary has been lost... The execution will continue. 


Regards,
Mahmood



________________________________
 From: Michael Jennings <mej at lbl.gov>
To: torqueusers at supercluster.org 
Sent: Thursday, January 31, 2013 9:09 PM
Subject: Re: [torqueusers] searcing for the directory of running job
 
On Wednesday, 30 January 2013, at 23:39:07 (-0800),
Mahmood Naderan wrote:

> Below is the output of "qstat -f". Please note that I am not looking for PBS_O_WORKDIR. That is the working directory which I ran qsub. What I want to find is the temporary directory on the computing node which is running the executable. Assume,
> 1- I compile the program
> 2- qsub the program
> 3- while '2' is running, I modify the code 
> 4- compile the code
> 5- qsub the program again
> 
> Now 2 instances of my program are running however they are independent. So torque should have copied the executables somewhere on the computing nodes to provide this independence. I want to find that location.

TORQUE copies job scripts but does NOT copy executables.  The Linux
kernel on the compute node(s) keeps the text and data segments of the
executable in memory even after it is overwritten.

lsof -p <job pid> is very informative in this regard.

Michael

-- 
Michael Jennings <mej at lbl.gov>
Senior HPC Systems Engineer
High-Performance Computing Services
Lawrence Berkeley National Laboratory
Bldg 50B-3209E        W: 510-495-2687
MS 050B-3209          F: 510-486-8615
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130131/d67175b0/attachment-0001.html 


More information about the torqueusers mailing list