[torqueusers] Strange job behaviour

Paulo Silva pjs at eurotux.com
Thu Sep 23 00:09:44 MDT 2004

Qua, 2004-09-22 às 09:59 -0600, Greg Wimpey escreveu:
> Have you logged in to the node while the job is running to watch what's
> going on (e.g., run top and see if some unexpected process is running
> alongside)? 

Yes, all seems ok. The process is actually using 99% of the CPU. I've
also used strace -p pid_of_program to see the running program and all
execute the same lines, some are just slower doing it.

> When you say "equal" jobs, do you mean identical?  Same
> code, same input data and parameters? 

Yes. I made the program specifically for this situation.

> Do the jobs read/write files from
> an NFS server?  If so, have you tried running the job using data on
> local disk? 

Yes I'm using NFS so I did a test using just the local disk and the
problem remains.

> I'm assuming the nodes are configured identically (same
> CPU, same amount/type of RAM, same O/S version).


> This is where I would start looking.

By now I'm pretty sure this isn't a torque issue (since the problem
remains if I execute the programs using just rsh) so this could be
rather offtopic in this list but could it be a heating problem? Someone
suggested me that the board/CPU could reduce it's performance as a
mesure to drop the temperature, is this possible?
Paulo Silva <pjs at eurotux.com>
Eurotux, SA
