[torqueusers] TM improvements
Garrick Staples
garrick at usc.edu
Tue Dec 6 13:42:18 MST 2005
On Tue, Dec 06, 2005 at 02:43:24PM -0500, Jeff Squyres alleged:
> On Dec 6, 2005, at 1:02 PM, Garrick Staples wrote:
>
> >>Gotcha. What exactly is 254? Is it an Exxxx errno code that I can
> >>compare to? If not, is there a documented list of the codes that I
> >>can compare against?
> >
> >254<<1 = 127 = bourne for "command not found"
>
> Hmm -- I don't understand your math there: 254 << 1 == 508.
My bad. I mean 254>>1.
> I'm also curious as to why you specified "bourne" -- are all error
> statuses reported per bourne shell semantics? What if the user's
> default shell is something other than bourne?
The user's shell isn't involved with spawning tasks. The specified
command is directly exec()'d.
> According to the Bash man page, I see the following:
>
> A full search of the directories in PATH is
> performed only if the command is not found in the hash table.
> If the
> search is unsuccessful, the shell prints an error message and
> returns
> an exit status of 127.
>
> So I can see where you get 127, but I don't understand the
> transformation from 254. Is that something that Torque does?
I'm not sure. This behaviour pre-dates TORQUE. I recall reading
something , though I can't find it right now, about exit values being
bitshifted if it's not directly from the process. I think it might have
been in the ERS. I think it is following POSIX semantics for something.
> >>For example -- use a close-on-exec pipe. The parent can block on a
> >>pipe after the fork() -- if it closes, the exec() succeeded. If the
> >>child's exec() fails, it can send a message back up the pipe saying
> >>"help, I failed!" This is not 100% foolproof, because at some point
> >>during exec(), the pipe will close but exec() could still fail, but
> >>it usually covers many common cases of failure (e.g., file not found,
> >>access denied, etc.).
> >
> >> /* Set the writing end to be close-on-exec */
> >> fcntl(fd[1], F_SETFD, FD_CLOEXEC);
> >
> >That's a terrific method! I've never seen that before (that's why you
> >are a programmer and I'm a sysadmin)! Is this portable?
>
> Heh. I'm of a firm belief that all programmers should be a sysadmin
> for a year (and vice versa). The would would be a better place.
>
> Yes, close-on-exec is portable. We use it in LAM/MPI; it's discussed
> in Stevens (I haven't read the new version yet, though).
>
> How far exec() goes before closing fd's is something that would need
> to be tested on different OS's.
>
> >I can definitely roll that technique into task and job launching.
>
> Does that mean the return from the poll for tm_spawn's event will
> show the error?
Yes. tm_spawn() can return an error of 1, 127, or 254, or TM_ENOTFOUND
or something.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051206/86d72bfb/attachment.bin
More information about the torqueusers
mailing list