[torqueusers] Torque with MPICH kills jobs consistently, but OpenPBS works fine

Prakash Velayutham velayups at email.uc.edu
Tue Dec 6 07:48:26 MST 2005


Garrick Staples wrote:
> On Mon, Dec 05, 2005 at 06:17:59PM -0500, Prakash Velayutham alleged:
>   
>>> You've got ethernet driver problems.  I'd recommend using e100 instead
>>> of eepro100.
>>>       
>> In case I have both eepro100 and e100 compiled in the kernel (not as 
>> modules), how do I make sure that the device uses the right driver (e100 
>> and not eepro100). This is the case here. I can see that both the 
>> drivers get loaded during the kernel boot, but I don't know how to 
>> control the driver that is to be used.
>>     
>
> Um, you build them as modules and alias the one you want in
> modules.conf/modprobe.conf.
Yeah, but in a root-over-NFS, you need the ethernet driver in the kernel 
and not as modules, right. The issue seems to be that both the drivers 
have been compiled into the kernel, instead of the e100. I will have to 
take down the cluster sometime (over a weekend) and redo some stuff, 
looks like.

But I will send an update on a level 4 debug of mom in any case. Could 
it be that the issue is not the ethernet drivers but something else??


More information about the torqueusers mailing list