[torqueusers] Re: [Mauiusers] Wall clock time of suspended jobs
gerson.sapac at gawab.com
Wed Aug 25 22:24:01 MDT 2004
I tried you suggestion (manually sending the "suspend" signal to suspend
a job and stop its wallclock) but it still didn't work.
Below is the accounting logs of the job I've run. I submitted two jobs
using the same pbs script and the second job finished first since I
suspended the first one. When I resumed the first job after the second
job has finished, the first job just run for a few seconds and stopped
(in other words, kicked out from the queue).
08/26/2004 13:34:59;E;446.dev.sapac.edu.au;user=gerson group=gerson
jobname=mpitest-2-17500 queue=parallel ctime=1093492594 qtime=1093492594
etime=1093492594 start=1093492601 exec_host=dev2/1+dev2/0+dev1/1+dev1/0
session=20509 end=1093493099 Exit_status=0 resources_used.cput=00:11:09
08/26/2004 13:42:06;E;445.dev.sapac.edu.au;user=gerson group=gerson
jobname=mpitest-2-17500 queue=parallel ctime=1093492541 qtime=1093492541
etime=1093492541 start=1093492542 exec_host=dev3/1+dev3/0+dev2/1+dev2/0
Resource_List.nodes=2:ppn=2 Resource_List.walltime=00:11:00 session=0
end=1093493526 Exit_status=0 resources_used.cput=00:01:01
Should a patch be applied on maui to stop the wallclock time countdown
of a suspended job or should it be applied on torque?
Simen Gaure wrote:
> For torque to stop the wallclock time the job must be suspended with the
> "suspend" signal, i.e. like the torque command
> qsig -s suspend <jobid>
> If it's suspended with
> qsig -s SIGSTOP <jobid>
> the wallclock won't stop.
> maui will normally send "suspend" (with pbs_sigjob(), similar to qsig),
> but if you have specified a SUSPENDSIG in maui's configuration, this
> will be sent instead and the wallclock is not stopped.
> fre, 20.08.2004 kl. 08.19 skrev Gerson Galang:
>>Has anybody in the torque or maui users list written a patch to stop the
>>walltime for suspended jobs.
>>The problem with wallclock not stopping when the job is suspended is
>>that, it can be kicked out from the queue if it has already exceeded its
>>requested wall time. Have the developers of maui looked into this issue
>>when they were working on the suspend-resume functionality of maui?
>>mauiusers mailing list
>>mauiusers at supercluster.org
More information about the torqueusers