[torqueusers] Resoving a jobid

Ken Nielson knielson at adaptivecomputing.com
Mon Jun 17 16:37:15 MDT 2013


Hi all,

In TORQUE 4.x we made changes to the naming. Occasionally, this has caused
some issues with job ids. For example a call to qdel with just a sequence
number would return an Unknown Job Id error if the naming (DNS)
configuration was not quite right. Eventually, we are able to help users
resolve the issue but it is not always straight forward.

I have a proposal to see if we can make it so TORQUE is more forgiving of
naming configurations when it comes to resolving job ids. Please let me
know if what follows will cause a problem with your current configuration.

A TORQUE job id is made of two parts; a sequence number assigned from
TORQUE and the host name of the machine where pbs_server is running. The
host name part is the name returned by a call to the 'C' function
gethostname(). gethostname() returns the same name as hostname -f from the
command line, This is the fully qualified domain name of the machine. So
all TORQUE job ids are the sequence number and the fqdn of the host.

When a TORQUE utility is unable to supply the <number/fqdn> pair correctly
an Unknown Job Id error is the result. This can be caused because as torque
tries to resolve the host name part of the job id it gets a short name
instead of the fqdn. This is the case when the first entry in the
/etc/hosts file is the short name.

My proposal would be to require that either the short name of a host or its
fqdn be put in the $TORQUE_HOME/server_name file. For example, if you had a
host named fred in the domain alma_matter.edu the server_name entry could
be either fred or fred.alma_matter.edu. However, you would not be allowed
to use a relative name like fred.alma_matter.

If we used the rule to use a short name or fdqn we could make it so TORQUE
can always resolve to the correct <number/fqdn> pair. But before we do this
I need to make sure no one is using a relative name for the server_name file

Thanks for your response.

-- 
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130617/f964f371/attachment.html 


More information about the torqueusers mailing list