[torquedev] Re: 2.3.6 release?

Joshua Bernstein jbernstein at penguincomputing.com
Tue Dec 16 12:08:30 MST 2008



Josh Butikofer wrote:
> No, it will not be based on the 2.4.0 snapshots ... it will be based on 
> the 2.3.6 snapshots. :)

Hmmm. I guess I just grabbed the latest snapshot and went from there. I 
can have a look at the 2.3.6 snap and generate a diff from there.

> However, I know the TORQUE developers are definitely interested in 
> putting in your patch for the pbs_mom in 2.4. Are you sure that the 
> pbs_mom's in 2.3.x are also not affected by the segfault you found?

Yes pbs_mom is affected in the 2.3.x branch as well as the 2.1 branch. 
Though I've only directly observed the failures in version 2.3.3, 2.3.5, 
and 2.1.9.

-Joshua Bernstein
Software Engineer
Penguin Computing

> Josh Butikofer
> Cluster Resources, Inc.
> #############################
> 
> 
> Joshua Bernstein wrote:
>> Is the 2.3.6 release based on the 2.4.0 snapshots?
>>
>> If so I would like to see a fix go in for the pbs_mom segfault I 
>> mentioned here:
>>
>> http://www.clusterresources.com/pipermail/torqueusers/2008-December/008411.html 
>>
>>
>> I can provide a patch and an explanation shortly.
>>
>> -Joshua Bernstein
>> Software Engineer
>> Penguin Computing
>>
>> Josh Butikofer wrote:
>>> Agreed. I will start the process so we can release soon. Does anyone 
>>> on the list have any objections to releasing 2.3.6? Is there anything 
>>> that needs put into TORQUE before this release?
>>>
>>> Josh Butikofer
>>> Cluster Resources, Inc.
>>> #############################
>>>
>>>
>>> Glen Beane wrote:
>>>> I think we should get 2.3.6 released, as it is right now pbs_sched can
>>>> not read its config file properly because of the hard tabs in strtok
>>>> delimiters that got replaced by space with astyle (these have been
>>>> fixed so they are \t in 2.3.6)
>>>>
>>>>
>>>>
>>>> 2.3.6
>>>>   e - in Linux, a pbs_mom will now "kill" a job's task, even if that
>>>> task can no longer be
>>>>       found in the OS processor table. This prevents jobs from getting
>>>> "stuck" when the PID
>>>>       vanishes in some rare cases.
>>>>   e - forward-ported change from 2.1-fixes (r2581) (b - reissue job
>>>> obit even if no
>>>>       processes are found)
>>>>   b - change back to not sending status updates until we get cluster
>>>> addr message
>>>>       from server, also only try to send hello when the server 
>>>> stream is down.
>>>>   b - change pbs_server so log_file_max_size of zero behavior matches
>>>> documentation
>>>>   e - added periodic logging of version and loglevel to help in support
>>>>   e - added pbs_mom config option ignvmem to ignore vmem/pvmem limit 
>>>> enforcement
>>>>   b - change to correct strtoks that accidentally got changed in astyle
>>>>       formatting
>>> _______________________________________________
>>> torquedev mailing list
>>> torquedev at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torquedev


More information about the torquedev mailing list