[torqueusers] 2 problems with torque-2.0.0p7

David B Jackson jacksond at clusterresources.com
Mon Jan 30 20:55:25 MST 2006


Martin,

  Your patch is exactly right.  The latest 2.1.0 snapshot corrects this
issue.  With this change in place, does your multi-homed host issue
disappear?

Dave

> On Mon, Jan 30, 2006 at 05:22:39PM -0800, Martin Siegert wrote:
>
>> 2) this problem has to do with multi-homed hosts and is by far more
>> serious as it stops me dead in my tracks:
>>
>> $PBS_HOME/server_name contains "b001"
>> $PBS_HOME/torque.cfg contains "SERVERHOST b001"
>>
>> When I submit a job with qsub it returns jobids of the form
>> 2345.<hostname>
>> instead of 2345.b001. This used to work in torque-2.0.0p3 (which is the
>> last version I used before switching to 2.0.0p7)! Thus, this broke
>> somewhere in versions 2.0.0p4 - 2.0.0p7. The effect is that, e.g.,
>>
>> qdel 2345
>>
>> does not work anymore - I always have to enter the full jobid
>> 2345.<hostname>, which is rather annoying and more importantly
>> impossible to explain to users.
>> I suspect that the problem is with pbs_server
>
> It appears that "TLoadConfig(Buffer,sizeof(Buffer))" in pbsd_main.c,
> line 505, only reads the first 4 characters of the torque.cfg file.
>
> Consider the following code:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main (int argc, char *argv[]){
> char *Buffer;
> int BufSize;
>
>    BufSize = 65536*sizeof(char);
>    Buffer = (char *)malloc(BufSize);
>    printf("BufSize=%i, sizeof(Buffer)=%i\n", BufSize, sizeof(Buffer));
> }
>
> When you run the corresponding program you get
>
> BufSize=65536, sizeof(Buffer)=4
>
> :-(
>
> In the older versions of torque Buffer was defined as
>
> char Buffer[65536];
>
> in which case sizeof(Buffer) has the desired result.
> Thus, we either
> 1) go back to the old version,
> 2) use the code from qsub.c (which is very similar to the old version),
> or use something like the following:
>
> --- src/server/pbsd_main.c.orig	Mon Jan 30 18:49:59 2006
> +++ src/server/pbsd_main.c	Mon Jan 30 19:08:47 2006
> @@ -452,6 +452,7 @@
>    time_t last_jobstat_time;
>    int    when;
>
> +  int    BufSize;
>    char   *Buffer;
>
>    void	 ping_nodes A_((struct work_task *ptask));
> @@ -476,7 +477,8 @@
>
>    ProgName = argv[0];
>
> -  Buffer=calloc(65536,sizeof(char));
> +  BufSize=65536*sizeof(char);
> +  Buffer=(char *)malloc(BufSize);
>
>    /* if we are not running with real and effective uid of 0, forget it */
>
> @@ -502,7 +504,7 @@
>
>    /* load/process config file first then override values with command
> line parameters */
>
> -  if (TLoadConfig(Buffer,sizeof(Buffer)) == 0)
> +  if (TLoadConfig(Buffer,BufSize) == 0)
>      {
>      char *ptr;
>      char *tptr;
>
>
> Cheers,
> Martin
>
> --
> Martin Siegert
> Head, HPC at SFU
> WestGrid Site Manager
> Academic Computing Services                        phone: (604) 291-4691
> Simon Fraser University                            fax:   (604) 291-4242
> Burnaby, British Columbia                          email: siegert at sfu.ca
> Canada  V5A 1S6
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



More information about the torqueusers mailing list