[torqueusers] Strange job behaviour

Paulo Silva pjs at eurotux.com
Thu Sep 23 00:09:44 MDT 2004


Qua, 2004-09-22 às 09:59 -0600, Greg Wimpey escreveu:
> Have you logged in to the node while the job is running to watch what's
> going on (e.g., run top and see if some unexpected process is running
> alongside)? 

Yes, all seems ok. The process is actually using 99% of the CPU. I've
also used strace -p pid_of_program to see the running program and all
execute the same lines, some are just slower doing it.

> When you say "equal" jobs, do you mean identical?  Same
> code, same input data and parameters? 

Yes. I made the program specifically for this situation.

> Do the jobs read/write files from
> an NFS server?  If so, have you tried running the job using data on
> local disk? 

Yes I'm using NFS so I did a test using just the local disk and the
problem remains.

> I'm assuming the nodes are configured identically (same
> CPU, same amount/type of RAM, same O/S version).

Correct.

> This is where I would start looking.

By now I'm pretty sure this isn't a torque issue (since the problem
remains if I execute the programs using just rsh) so this could be
rather offtopic in this list but could it be a heating problem? Someone
suggested me that the board/CPU could reduce it's performance as a
mesure to drop the temperature, is this possible?
-- 
Paulo Silva <pjs at eurotux.com>
Eurotux, SA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20040923/e84539ac/attachment.bin


More information about the torqueusers mailing list