[torqueusers] TM improvements

Garrick Staples garrick at usc.edu
Tue Dec 6 14:29:06 MST 2005


On Tue, Dec 06, 2005 at 04:02:37PM -0500, Jeff Squyres alleged:
> On Dec 6, 2005, at 3:42 PM, Garrick Staples wrote:
> 
> >>>254<<1 = 127 = bourne for "command not found"
> >>
> >>Hmm -- I don't understand your math there: 254 << 1 == 508.
> >
> >My bad.  I mean 254>>1.
> 
> Ah, ok.  So the LSB is reserved for something...?
> 
> >>I'm also curious as to why you specified "bourne" -- are all error
> >>statuses reported per bourne shell semantics?  What if the user's
> >>default shell is something other than bourne?
> >
> >The user's shell isn't involved with spawning tasks.  The specified
> >command is directly exec()'d.
> 
> Hmm.  This seems inconsistent:
> 
> Using tcsh:
> 
> -----
> [15:49] vogon:~ % asdfasdfasdf
> asdfasdfasdf: Command not found.
> [15:49] vogon:~ % echo $status
> 1
> -----

Your shell doesn't matter.  It is not used in spawning tasks.  No matter
what, if the final exec() fails, then the exit value of the task is set
to 254.  Look at src/resmom/start_exec.c:start_process().

...
  execvp(argv[0],argv);

  /* only reached if execvp() fails */

  sprintf(log_buffer,"PBS: %.256s: %s\n",
    argv[0],
    strerror(errno));

  write(2,log_buffer,strlen(log_buffer));

  fsync(2);

  exit(254);

  /*NOTREACHED*/

  return(-1);
  }  /* END start_process() */

 

> >>>I can definitely roll that technique into task and job launching.
> >>
> >>Does that mean the return from the poll for tm_spawn's event will
> >>show the error?
> >
> >Yes.  tm_spawn() can return an error of 1, 127, or 254, or  
> >TM_ENOTFOUND
> >or something.
> 
> TM_ENOTFOUND would be awesome.  :-)  Is that consistent with the  
> tm_error values returned in other cases?  (i.e., that they're TM_*  
> constants)

Not really.  TM_ENOTFOUND seems to generally mean that a task wasn't
found when calling the other tm_* functions.

 
> My only concern is that if *you're* setting TM_ENOTFOUND based on the  
> value 254, then we should understand what that 254 *means* (i.e.,  
> will it really be 254 on all platforms?).

./src/include/tm_.h:#define     TM_ENOTFOUND            17006

 
> Also, can we have some other problems reported as well?  Based on  
> output you sent previously in this thread, I think you already handle  
> the "not found" case -- sending TM_ENOTFOUND back should be easy.   
> Can we recognize permission denied cases as well?

Maybe.  tm_spawn() doesn't have an argument to return a tm_errno.
Returning exec()'s errno would be goofy.  Maybe it doesn't matter since
strerr() is sent to the job's stderr.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051206/a1b88526/attachment.bin


More information about the torqueusers mailing list