[torqueusers] 2 problems with torque-2.0.0p7

Martin Siegert siegert at sfu.ca
Mon Jan 30 20:36:09 MST 2006


On Mon, Jan 30, 2006 at 05:22:39PM -0800, Martin Siegert wrote:

> 2) this problem has to do with multi-homed hosts and is by far more
> serious as it stops me dead in my tracks:
> 
> $PBS_HOME/server_name contains "b001"
> $PBS_HOME/torque.cfg contains "SERVERHOST b001"
> 
> When I submit a job with qsub it returns jobids of the form 2345.<hostname>
> instead of 2345.b001. This used to work in torque-2.0.0p3 (which is the
> last version I used before switching to 2.0.0p7)! Thus, this broke
> somewhere in versions 2.0.0p4 - 2.0.0p7. The effect is that, e.g.,
> 
> qdel 2345
> 
> does not work anymore - I always have to enter the full jobid
> 2345.<hostname>, which is rather annoying and more importantly
> impossible to explain to users.
> I suspect that the problem is with pbs_server

It appears that "TLoadConfig(Buffer,sizeof(Buffer))" in pbsd_main.c,
line 505, only reads the first 4 characters of the torque.cfg file.

Consider the following code:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[]){
char *Buffer;
int BufSize;

   BufSize = 65536*sizeof(char);
   Buffer = (char *)malloc(BufSize);
   printf("BufSize=%i, sizeof(Buffer)=%i\n", BufSize, sizeof(Buffer));
}

When you run the corresponding program you get

BufSize=65536, sizeof(Buffer)=4

:-(

In the older versions of torque Buffer was defined as

char Buffer[65536];

in which case sizeof(Buffer) has the desired result.
Thus, we either
1) go back to the old version,
2) use the code from qsub.c (which is very similar to the old version),
or use something like the following:

--- src/server/pbsd_main.c.orig	Mon Jan 30 18:49:59 2006
+++ src/server/pbsd_main.c	Mon Jan 30 19:08:47 2006
@@ -452,6 +452,7 @@
   time_t last_jobstat_time;
   int    when;
 
+  int    BufSize;
   char   *Buffer;
 
   void	 ping_nodes A_((struct work_task *ptask));
@@ -476,7 +477,8 @@
 
   ProgName = argv[0];
 
-  Buffer=calloc(65536,sizeof(char));
+  BufSize=65536*sizeof(char);
+  Buffer=(char *)malloc(BufSize);
 
   /* if we are not running with real and effective uid of 0, forget it */
 
@@ -502,7 +504,7 @@
 
   /* load/process config file first then override values with command line parameters */
 
-  if (TLoadConfig(Buffer,sizeof(Buffer)) == 0)
+  if (TLoadConfig(Buffer,BufSize) == 0)
     {
     char *ptr;
     char *tptr;


Cheers,
Martin

-- 
Martin Siegert
Head, HPC at SFU
WestGrid Site Manager
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6


More information about the torqueusers mailing list