Bug 118 - Dynamic Consumable Generic Resources dose not work as documented
: Dynamic Consumable Generic Resources dose not work as documented
Status: NEW
Product: TORQUE
pbs_mom
: 3.0.x
: PC Linux
: P5 normal
Assigned To: Ken Nielson
:
:
:
  Show dependency treegraph
 
Reported: 2011-03-30 08:26 MDT by mcoyne
Modified: 2011-11-11 01:29 MST (History)
4 users (show)

See Also:


Attachments
torque3.0_shell_escape.patch seems to correct the issue (1.04 KB, patch)
2011-03-30 08:26 MDT, mcoyne
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description mcoyne 2011-03-30 08:26:07 MDT
Created an attachment (id=75) [details]
torque3.0_shell_escape.patch seems to correct the issue

Dynamic Consumable Generic Resources
http://www.adaptivecomputing.com/resources/docs/torque/a.cmomconfig.php

when trying to use a dynamic "shell escape" in the mom's config file the
resource shows up as

name:!/my/script in the gres instead of the returned value of the script 
the same issue i believe exits in 2.5.1 as well .

attached is a patch to possibly correct the issue. it calls
conf_res(cp->c_u.c_value,NULL) to expand the shell escape prior to putting it
in
the gres line. conf_res returns a pointer to what i assume is a static tempory
string as i did not find a instance of the returned pointer being free'd could
be wrong though..
Comment 1 Robert Oostenveld 2011-11-10 15:01:53 MST
(In reply to comment #0)

Let me please confirm that the modifications in the patch also solves the
problem for me on Torque 3.0.2.
Comment 2 Robert Oostenveld 2011-11-10 15:02:58 MST
(In reply to comment #1)
> (In reply to comment #0)
> 
> Let me please confirm that the modifications in the patch also solves the
> problem for me on Torque 3.0.2.

that should read: " I can confirm that ..."
Comment 3 Ken Nielson 2011-11-10 17:36:44 MST
What is conv_res?
Comment 4 Ken Nielson 2011-11-10 17:37:20 MST
conf_res(). Sorry about the typo
Comment 5 Robert Oostenveld 2011-11-11 01:29:26 MST
Don't know what it does, but it is a function that is declared and defined
further down in the mom_main.c code. It says

/*
** Check the request against the format of the line read from
** the config file.  If it is a static value, there should be
** no params.  If it is a shell escape, the parameters (if any)
** should match the command line for the system call.
*/

char *conf_res(

  char               *resline,        /* I */
  struct rm_attribute *attr)    /* I */
...

Quickly browsing the code, the relevant stuff seems that it checks whether
resline[0] is equal to '!'. It then parses the remainder and calls
popen(ret_string, "r")) which is a system-call that executes the requested
command.

The shell script that I have it calling by specifying in the mom_priv/config
matlab !/opt/cluster/matlabfree
is the following

--------------------------------------------------------
#!/bin/bash
#
# This scrip is used for torque to count the number of free floating network
licenses.
#
# Please note that it contains a variable with the number of DCCN licenses,
this
# should be updated whenever the number of licenses that we can use on the
campus-wide
# license server changes.
#
#
# 10nov2011 - roboos: created
# 11nov2011 - roboos: added fcdc and FCDC to the search terms

MATLABAVAIL=100
MATLABINUSE=`/opt/cluster/lmstat -f MATLAB | awk '/mentat/; /fcdc/; /dccn/;
/FCDC/; /DCCN/' | uniq | wc -l`

let MATLABFREE="$MATLABAVAIL"-"$MATLABINUSE"

if [ $MATLABFREE -lt 0 ] ; then
MATLABFREE=0
fi

echo $MATLABFREE

exit 0
--------------------------------------------------------

With the suggested patch it works, i.e pbsnodes reports correctly the number
that is is reported by the shell script (and I also see that the number varies
over time as the number of floating network licenses changes).

Note however that this does not mean that my goals are met. I still have an
issue that 
qsub -l other=matlab
or 
qsub -W x=GRES:matlab
as explained in the documentation on
http://www.clusterresources.com/torquedocs21/2.1jobsubmission.shtml don't seem
to pay any attention to this matlab resource (which I have defined on only a
single node so far). But that is something that falls outside the scope of this
bug...