[torqueusers] Bugs/glitches in torque 2.3.0?

Steve Snelgrove ssnelgrove at clusterresources.com
Tue Apr 8 14:13:32 MDT 2008


Roy Dragseth wrote:
> It seems to be a slight regression in 2.3.0 at least from 2.1.8:
>
> The pbsnodes output does not contain correct information abount ncpus and size 
> of local partition:
>
>
> On torque-2.1.8 (in production):
>
> # /opt/torque/bin/pbsnodes c1-1
> c1-1
>      state = free
>      np = 8
>      properties = ib,switch1
>      ntype = cluster
>      status = opsys=linux,uname=Linux compute-1-1.local 
> 2.6.9-42.0.10.EL_SFS2.2_1smp #1 SMP Tue Nov 27 22:16:24 CET 2007 
> x86_64,sessions=? 0,nsessions=? 
> 0,nusers=0,idletime=87478,totmem=33932004kb,availmem=33639500kb,physmem=32911888kb,ncpus=8,loadave=0.00,netload=63132375264,size=99042372kb:102237336kb,state=free,jobs=? 
> 0,rectime=1207605969
>
>
> On torque-2.3.0 (test setup):
>
> # /var/tmp/torque/bin/pbsnodes c1-1
> c1-1
>      state = free
>      np = 8
>      ntype = cluster
>      status = opsys=linux,uname=Linux compute-1-1.local 
> 2.6.9-42.0.10.EL_SFS2.2_1smp #1 SMP Tue Nov 27 22:16:24 CET 2007 
> x86_64,sessions=? 0,nsessions=? 
> 0,nusers=0,idletime=87549,totmem=33932004kb,availmem=33639564kb,physmem=32911888kb,ncpus=? 
> 0,loadave=0.00,netload=63133975015,size=@�T,state=free,jobs=,varattr=,rectime=1207606034
>
> The ncups and ncpus fields are not correct.
>
> Both mom_priv/configs are identical.
> r.
>
>
>   
I believe I have fixed both of these problems.  See the attached patch.

The problem with ncpus was in the linux/mom_mach.c file where someone had
attempted to fix an infinite loop reading the /proc/cpuinfo file.  It seems
that this file might vary depending on the flavor of linux and it was 
possible
that the fscanf could fail and the input pointer not advance.  So on a bad
result of fscanf, NULL was being returned instead of a string with the cpu
count printed into it.

The problem with size, I introduced while attempting to simplify the 
mom_server
code.  I did not understand how this was triggered in the mom config 
file and
thus failed to test this case.

Hopefully this will work better now.

Steve

-------------- next part --------------
Index: src/resmom/mom_server.c
===================================================================
--- src/resmom/mom_server.c	(revision 2073)
+++ src/resmom/mom_server.c	(working copy)
@@ -713,11 +713,24 @@
 
 extern struct config *config_array;
 
+/**
+ * gen_size
+ *
+ * For the size attribute to be returned, it must be
+ * defined in the pbs_mom config file.  The syntax
+ * is unique in that you must ask for the size of
+ * either a file or a file system.
+ *
+ * For example:
+ * size[fs=/]
+ * size[file=/home/user/test.txt]
+ */
 void
 gen_size(char *name,char **BPtr, int *BSpace)
   {
   struct config  *ap;
   struct rm_attribute *attr;
+  char *value;
 
   ap = rm_search(config_array,name);
   if (ap)
@@ -725,11 +738,15 @@
     attr = momgetattr(ap->c_u.c_value);
     if (attr)
       {
-      MUSNPrintF(BPtr,BSpace,"%s=%s",
-        name,
-        attr);
-      (*BPtr)++; /* Need to start the next string after the null */
-      (*BSpace)--;
+      value = dependent(name,attr);
+      if (value && *value)
+        {
+        MUSNPrintF(BPtr,BSpace,"%s=%s",
+          name,
+          value);
+        (*BPtr)++; /* Need to start the next string after the null */
+        (*BSpace)--;
+        }
       }
     }
   }
Index: src/resmom/linux/mom_mach.c
===================================================================
--- src/resmom/linux/mom_mach.c	(revision 2073)
+++ src/resmom/linux/mom_mach.c	(working copy)
@@ -3211,9 +3211,11 @@
 
   while (!feof(fp))  
     {
-    fscanf(fp,"%s %*[^\n]%*c", label);
-
-    if (strcmp("processor",label) == 0)
+    if (fscanf(fp,"%s %*[^\n]%*c", label) == 0)
+      {
+      getc(fp);  /* must do something to get to eof */
+      }
+    else if (strcmp("processor",label) == 0)
       procs++;
     }
 


More information about the torqueusers mailing list