[torqueusers] Re: [Mauiusers] Wall clock time of suspended jobs

Gerson Galang gerson.sapac at gawab.com
Wed Aug 25 22:24:01 MDT 2004


Hi Simen,

I tried you suggestion (manually sending the "suspend" signal to suspend 
a job and stop its wallclock) but it still didn't work.

Below is the accounting logs of the job I've run. I submitted two jobs 
using the same pbs script and the second job finished first since I 
suspended the first one. When I resumed the first job after the second 
job has finished, the first job just run for a few seconds and stopped 
(in other words, kicked out from the queue).

08/26/2004 13:34:59;E;446.dev.sapac.edu.au;user=gerson group=gerson 
jobname=mpitest-2-17500 queue=parallel ctime=1093492594 qtime=1093492594 
etime=1093492594 start=1093492601 exec_host=dev2/1+dev2/0+dev1/1+dev1/0 
Resource_List.neednodes=2:ppn=2 Resource_List.nodect=2 
Resource_List.nodes=2:ppn=2 Resource_List.walltime=00:11:00 
session=20509 end=1093493099 Exit_status=0 resources_used.cput=00:11:09 
resources_used.mem=11116kb resources_used.vmem=23768kb 
resources_used.walltime=00:08:18
08/26/2004 13:42:06;E;445.dev.sapac.edu.au;user=gerson group=gerson 
jobname=mpitest-2-17500 queue=parallel ctime=1093492541 qtime=1093492541 
etime=1093492541 start=1093492542 exec_host=dev3/1+dev3/0+dev2/1+dev2/0 
Resource_List.neednodes=2:ppn=2 Resource_List.nodect=2 
Resource_List.nodes=2:ppn=2 Resource_List.walltime=00:11:00 session=0 
end=1093493526 Exit_status=0 resources_used.cput=00:01:01 
resources_used.mem=10996kb resources_used.vmem=23752kb 
resources_used.walltime=00:00:46

Should a patch be applied on maui to stop the wallclock time countdown 
of a suspended job or should it be applied on torque?

Regards,
Gerson


Simen Gaure wrote:
> For torque to stop the wallclock time the job must be suspended with the
> "suspend" signal, i.e. like the torque command 
> 
> qsig -s suspend <jobid>
> 
> If it's suspended with
> qsig -s SIGSTOP <jobid>
> the wallclock won't stop.
> 
> maui will normally send "suspend" (with pbs_sigjob(), similar to qsig),
> but if you have specified a SUSPENDSIG in maui's configuration, this
> will be sent instead and the wallclock is not stopped.
> 
> fre, 20.08.2004 kl. 08.19 skrev Gerson Galang:
> 
>>Has anybody in the torque or maui users list written a patch to stop the 
>>walltime for suspended jobs.
>>
>>The problem with wallclock not stopping when the job is suspended is 
>>that, it can be kicked out from the queue if it has already exceeded its 
>>requested wall time. Have the developers of maui looked into this issue 
>>when they were working on the suspend-resume functionality of maui?
>>
>>Thanks,
>>gerson
>>
>>_______________________________________________
>>mauiusers mailing list
>>mauiusers at supercluster.org
>>http://supercluster.org/mailman/listinfo/mauiusers


More information about the torqueusers mailing list