[torqueusers] TM improvements

Garrick Staples garrick at usc.edu
Tue Dec 6 13:42:18 MST 2005


On Tue, Dec 06, 2005 at 02:43:24PM -0500, Jeff Squyres alleged:
> On Dec 6, 2005, at 1:02 PM, Garrick Staples wrote:
> 
> >>Gotcha.  What exactly is 254?  Is it an Exxxx errno code that I can
> >>compare to?  If not, is there a documented list of the codes that I
> >>can compare against?
> >
> >254<<1 = 127 = bourne for "command not found"
> 
> Hmm -- I don't understand your math there: 254 << 1 == 508.

My bad.  I mean 254>>1.

 
> I'm also curious as to why you specified "bourne" -- are all error  
> statuses reported per bourne shell semantics?  What if the user's  
> default shell is something other than bourne?

The user's shell isn't involved with spawning tasks.  The specified
command is directly exec()'d.

 
> According to the Bash man page, I see the following:
> 
>     A full search of the directories in PATH is
>     performed only if the command is not found in the hash table.   
> If the
>     search is unsuccessful, the shell prints an error message and  
> returns
>     an exit status of 127.
> 
> So I can see where you get 127, but I don't understand the  
> transformation from 254.  Is that something that Torque does?

I'm not sure.  This behaviour pre-dates TORQUE.  I recall reading
something , though I can't find it right now, about exit values being
bitshifted if it's not directly from the process.  I think it might have
been in the ERS.  I think it is following POSIX semantics for something.

 
> >>For example -- use a close-on-exec pipe.  The parent can block on a
> >>pipe after the fork() -- if it closes, the exec() succeeded.  If the
> >>child's exec() fails, it can send a message back up the pipe saying
> >>"help, I failed!"  This is not 100% foolproof, because at some point
> >>during exec(), the pipe will close but exec() could still fail, but
> >>it usually covers many common cases of failure (e.g., file not found,
> >>access denied, etc.).
> >
> >>        /* Set the writing end to be close-on-exec */
> >>        fcntl(fd[1], F_SETFD, FD_CLOEXEC);
> >
> >That's a terrific method!  I've never seen that before (that's why you
> >are a programmer and I'm a sysadmin)!  Is this portable?
> 
> Heh.  I'm of a firm belief that all programmers should be a sysadmin  
> for a year (and vice versa).  The would would be a better place.
> 
> Yes, close-on-exec is portable.  We use it in LAM/MPI; it's discussed  
> in Stevens (I haven't read the new version yet, though).
> 
> How far exec() goes before closing fd's is something that would need  
> to be tested on different OS's.
> 
> >I can definitely roll that technique into task and job launching.
> 
> Does that mean the return from the poll for tm_spawn's event will  
> show the error?

Yes.  tm_spawn() can return an error of 1, 127, or 254, or TM_ENOTFOUND
or something.

 
-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051206/86d72bfb/attachment.bin


More information about the torqueusers mailing list