[torqueusers] Torque MOM 4.2.6 - found bug (MOM won't start with cpusets configured) + solution.

Martin Siegert siegert at sfu.ca
Tue Nov 26 11:41:15 MST 2013


Hi,

yes, we encountered this as well and reported it in

http://www.clusterresources.com/bugzilla/show_bug.cgi?id=244

together with basically the same solution that you suggest.
Hopefully this will get fixed in the next release.

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6

On Tue, Nov 26, 2013 at 01:34:55PM +0100, Johny wrote:
> 
>    W dniu 2013-11-26 13:26, Johny pisze:
> 
>    Hello.
>    I've found some bug with newest release of Torque 4.2.6.
>    When compiled with options:
>    ./configure --with-default-server=### --with-rcp=/usr/bin/scp
>    --enable-cpuset --enable-nvidia-gpus --enable-blcr
>    --enable-geometry-requests --enable-unixsockets=no
>    ...if just won't start (mom client), hanging during reading files
>    /sys/devices/system/nodes/.../cpulist.
>    The problem is in function:
>    /scr/resmom/linux/numa_node.cpp -> void numa_node::parse_cpu_string()
>    There is a loop, parsing subsequent parts of string (this is line of
>    text read from files mentioned above):
>    LINE 121:
>      while (*ptr != '\0')
>        {
>        prev = strtol(ptr, &ptr, 10);
>        if (*ptr == '-')
>          {
>          ptr++;
>          curr = strtol(ptr, &ptr, 10);
>          while (prev <= curr)
>            {
>    #ifdef PENABLE_LINUX26_CPUSETS
>            if ((MOMConfigUseSMT == 1) ||
>                (is_physical_core(prev) == true))
>    #endif
>              {
>              this->cpu_indices.push_back(prev);
>              this->cpu_avail.push_back(true);
>              this->total_cpus++;
>              this->available_cpus++;
>              }
>            prev++;
>            }
>          if (*ptr == ',')
>            ptr++;
>          }
>        else if ((*ptr == ',') ||
>                 (*ptr == '\0'))
>          {
>    #ifdef PENABLE_LINUX26_CPUSETS
>          if ((MOMConfigUseSMT == 1) ||
>              (is_physical_core(prev) == true))
>    #endif
>            {
>            this->cpu_indices.push_back(prev);
>            this->cpu_avail.push_back(true);
>            this->total_cpus++;
>            this->available_cpus++;
>            }
>          ptr++;
>          }
>        }
>    This loop omits character '\0' ending the string: it enters second "if"
>    (because *ptr == '\0') and then increments pointer which leads to
>    pointer overflow.
>    Then... in subsequent iterations it just does nothing (because there
>    are usually some rubbish data after '\0' and "strtol" cannot parse them
>    so the pointer remains the same).
> 
>    To resolve this problem I've just added one line.
>    Previous version:
>    161: }
>    162:
>    163:ptr++;
>    I've hanged to:
>    161: }
>    162:
>    163: if(*ptr == '\n') break;
>    164: ptr++;
>    Now, when it enters second if (when *ptr == '\0'), it saves data about
>    the core and exits the loop.
>    Maybe there is more elegant way to do this but this is simple and just
>    works (tested).
>    Regards,
>    Peter.
> 
> _______________________________________________
> torqueusers mailing list
> [1]torqueusers at supercluster.org
> [2]http://www.supercluster.org/mailman/listinfo/torqueusers
> 
>    My mistake.There of course should be:
>    I've hanged to:
>    161: }
>    162:
>    163: if(*ptr == '\0') break;
>    164: ptr++;
>    Sorry, to much work today :(
>    Regards,
>    Peter
> 
> References
> 
>    1. mailto:torqueusers at supercluster.org
>    2. http://www.supercluster.org/mailman/listinfo/torqueusers

> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list