[torqueusers] Torque with MPICH kills jobs consistently, but OpenPBS works fine

Prakash Velayutham velayups at email.uc.edu
Mon Dec 5 16:17:59 MST 2005


Garrick Staples wrote:
> On Mon, Dec 05, 2005 at 05:12:08PM -0500, Prakash Velayutham alleged:
>   
>> 11/08/2005 11:23:32;0008;PBS_Server;Job;50645.ribosome.cchmc.org;Job Run 
>> at request of Scheduler at ribosome.cchmc.org
>>     
>
>   
>> 11/08/2005 11:24:48;0100;PBS_Server;Req;;Type JobObituary request 
>> received from pbs_mom at tyrosine.bmicluster1.cchmc.org, sock=9
>> 11/08/2005 
>>     
>
> Don't see an external job delete...
>
>
>   
>> Here is the mom log:
>>
>> 11/08/2005 11:22:30;0001;   pbs_mom;Job;TMomFinalizeJob3;job 
>> 50645.ribosome.cchmc.org started, pid = 2806
>> 11/08/2005 11:22:31;0008;   
>> pbs_mom;Job;50645.ribosome.cchmc.org;start_process: task started, tid 2, 
>> sid 2866, cmd /bin/sh
>> 11/08/2005 11:23:37;0008;   
>> pbs_mom;Job;50645.ribosome.cchmc.org;kill_task: killing pid 2877 task 2 
>> with sig 9
>>     
>
> Increase MOM's loglevel over 4, it should log why kill_task is being
> called.
>
>  
>   
>> Does not seem to help.
>>
>> In the syslog, just this one line repeats itself.
>>
>> Nov  8 11:23:02 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:07 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:09 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:11 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:11 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:17 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:19 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:24 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:27 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:29 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:31 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:32 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:43 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:43 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:51 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:53 tyrosine kernel: eth1: freeing mc frame.
>> Nov  8 11:23:55 tyrosine kernel: eth1: freeing mc frame.
>>     
>
> You've got ethernet driver problems.  I'd recommend using e100 instead
> of eepro100.
In case I have both eepro100 and e100 compiled in the kernel (not as 
modules), how do I make sure that the device uses the right driver (e100 
and not eepro100). This is the case here. I can see that both the 
drivers get loaded during the kernel boot, but I don't know how to 
control the driver that is to be used.

Thanks,
Prakash


More information about the torqueusers mailing list