[torqueusers] Torque with MPICH kills jobs consistently, but
OpenPBS works fine
Prakash Velayutham
velayups at email.uc.edu
Mon Dec 5 16:17:59 MST 2005
Garrick Staples wrote:
> On Mon, Dec 05, 2005 at 05:12:08PM -0500, Prakash Velayutham alleged:
>
>> 11/08/2005 11:23:32;0008;PBS_Server;Job;50645.ribosome.cchmc.org;Job Run
>> at request of Scheduler at ribosome.cchmc.org
>>
>
>
>> 11/08/2005 11:24:48;0100;PBS_Server;Req;;Type JobObituary request
>> received from pbs_mom at tyrosine.bmicluster1.cchmc.org, sock=9
>> 11/08/2005
>>
>
> Don't see an external job delete...
>
>
>
>> Here is the mom log:
>>
>> 11/08/2005 11:22:30;0001; pbs_mom;Job;TMomFinalizeJob3;job
>> 50645.ribosome.cchmc.org started, pid = 2806
>> 11/08/2005 11:22:31;0008;
>> pbs_mom;Job;50645.ribosome.cchmc.org;start_process: task started, tid 2,
>> sid 2866, cmd /bin/sh
>> 11/08/2005 11:23:37;0008;
>> pbs_mom;Job;50645.ribosome.cchmc.org;kill_task: killing pid 2877 task 2
>> with sig 9
>>
>
> Increase MOM's loglevel over 4, it should log why kill_task is being
> called.
>
>
>
>> Does not seem to help.
>>
>> In the syslog, just this one line repeats itself.
>>
>> Nov 8 11:23:02 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:07 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:09 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:11 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:11 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:17 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:19 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:24 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:27 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:29 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:31 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:32 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:43 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:43 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:51 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:53 tyrosine kernel: eth1: freeing mc frame.
>> Nov 8 11:23:55 tyrosine kernel: eth1: freeing mc frame.
>>
>
> You've got ethernet driver problems. I'd recommend using e100 instead
> of eepro100.
In case I have both eepro100 and e100 compiled in the kernel (not as
modules), how do I make sure that the device uses the right driver (e100
and not eepro100). This is the case here. I can see that both the
drivers get loaded during the kernel boot, but I don't know how to
control the driver that is to be used.
Thanks,
Prakash
More information about the torqueusers
mailing list