[torqueusers] Wrong cput value

Brock Palen brockp at umich.edu
Tue Jul 22 14:00:59 MDT 2008


Where these jobs differnt code?
Some code (hfss comes to mind)
forks the real process and somehow torque looses track of it.  So  
cput will almost be zero.

Other options if your using parallel code the user is not using a tm  
enabled mpirun.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jul 22, 2008, at 2:39 PM, Kevin Murphy wrote:
> Torque 2.3.1, CentOS 5.1.
>
> I recently ran tracejob to compare runtime versus data-size  
> statistics on 563 jobs, and three of them had impossibly low  
> resources_used.cput values.  (For one such job, cput was 1/65th of  
> what it should have been, approximately, based on the size of the  
> input and output files).  Anybody else seen this?  The three jobs  
> in question executed on different nodes, and they neither started  
> nor ended at the same time.  The jobs generated credible output.   
> The 563 jobs in this set lasted between 7 minutes and 5 hours  
> walltime (5.5 min - 3.5hr cput) depending on the size of the input  
> data, and when I graph time versus output size, it forms a nice  
> cleanish line, with those 3 extreme outliers.  The three weird jobs  
> had walltimes of 1:53:28, 2:43:59, and 3:32:09, so the incorrect  
> cput values are not the result of natural variation in wall vs cpu  
> times.
>
> Thanks,
> Kevin Murphy
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>



More information about the torqueusers mailing list