[torqueusers] torque-2.1.6 - pbs_mom cannot write to its log
Alessandro Federico
alessandro.federico at caspur.it
Thu Oct 18 03:09:09 MDT 2007
Garrick Staples wrote:
> On Wed, Oct 17, 2007 at 11:49:01AM +0200, Alessandro Federico alleged:
>> Hi all.
>>
>> I'm running torque-2.1.6 on SLES10 x86_64 (2.6.16.27-0.9-smp).
>> Sometimes I observe this strange behavior:
>>
>> 1) before a node starts/joins the first job of the day
>> the file descriptor of the log file is correct
>>
>> --------------------------------------------
>> # lsof -p `pidof pbs_mom` | grep mom_logs
>> pbs_mom 7541 root 3w REG 8,1 208319 126550
>> /opt/spool/torque/mom_logs/20071017
>> --------------------------------------------
>>
>> 2) after the node starts/joins the first job of the day
>> the file descriptor of the log file becomes corrupted
>
> It's probably some other memory corruption going on. Can you duplicate with 2.1.9?
>
Of course you're right but I can't upgrade to 2.1.9 at the moment
(the upgrade is planned at the end of November).
The strange thing is that we are running torque-2.1.6 on our
cluster since January 2007 and it was working until the end
of September (when the problem has begun).
On July we upgrade the SLES10 kernel from 2.6.16.21-0.25-smp
to 2.6.16.27-0.9-smp.
Can the problem be related to this upgrade?
Thanks
Ale
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
Alessandro Federico
CASPUR http://www.caspur.it/
e-mail: alessandro.federico at caspur.it
phone: +39 06 44486708
fax: +39 06 4957083
------------------------------------------
Military intelligence is a contradiction
in terms. (Groucho Marx)
------------------------------------------
More information about the torqueusers
mailing list