[torqueusers] Re: Current Status showing INFINITY and job not
deleting
Vadivelan Ranjith
velan.aero at gmail.com
Fri Apr 20 23:18:56 MDT 2007
On 4/21/07, Vadivelan Ranjith <velan.aero at gmail.com> wrote:
>
> Hi
> some of our compute nodes went down due to power failure. We booted some
> nodes after few days. After booting nodes, i deleted all jobs manually using
> qdel in server. All jobs deleted except two jobs. when i type showq
> ACTIVE JOBS--------------------
> JOBNAME USERNAME STATE PROC REMAINING
> STARTTIME
>
> 12377 prashant Running 1 -INFINITY Fri Mar 23
> 15:34:49
> 12361 prashant Running 1 -INFINITY Fri Mar 23
> 15:34:49
> 12769 vilask Running 1 1:08:57:44 Tue Apr 17
> 19:42:11
> 12775 dmashok Running 1 1:10:46:14 Tue Apr 17
> 21:30:41
> 12777 shinisha Running 1 1:10:48:18 Tue Apr 17
> 21:32:45
> 12778 mehta Running 1 1:10:51:55 Tue Apr 17
> 21:36:22
> 12779 mehta Running 1 1:10:51:55 Tue Apr 17
> 21:36:22
> 12789 atuls Running 1 1:21:59:27 Wed Apr 18
> 08:43:54
> 12790 atuls Running 1 1:21:59:58 Wed Apr 18
> 08:44:25
> 12791 atuls Running 1 1:22:00:29 Wed Apr 18
> 08:44:56
> 12796 sndatta Running 1 2:01:59:11 Wed Apr 18
> 12:43:38
> 12768 deepa Running 1 2:02:23:59 Wed Apr 18
> 13:08:26
> 12803 dipankar Running 1 2:22:35:34 Thu Apr 19
> 09:20:01
> 12804 dipankar Running 1 2:22:45:54 Thu Apr 19
> 09:30:21
> 12805 shinisha Running 1 2:23:00:22 Thu Apr 19
> 09:44:49
> 12806 mahendra Running 1 2:23:30:20 Thu Apr 19
> 10:14:47
> 12816 mahendra Running 1 3:05:38:12 Thu Apr 19
> 16:22:39
> 12838 dmashok Running 1 4:00:31:31 Fri Apr 20
> 11:15:58
> 12839 shinisha Running 1 4:01:04:04 Fri Apr 20
> 11:48:31
> 12851 dmashok Running 1 4:11:04:12 Fri Apr 20
> 21:48:39
> 12849 vilask Running 1 4:23:25:54 Sat Apr 21
> 10:10:21
> 12850 deepa Running 1 4:23:25:54 Sat Apr 21
> 10:10:21
>
> 22 Active Jobs 22 of 32 Processors Active ( 68.75%)
> 14 of 16 Nodes Active (87.50%)
>
> IDLE JOBS----------------------
> JOBNAME USERNAME STATE PROC WCLIMIT
> QUEUETIME
>
>
> 0 Idle Jobs
>
> BLOCKED JOBS----------------
> JOBNAME USERNAME STATE PROC WCLIMIT
> QUEUETIME
>
> 12333 mahendra Deferred 1 5:00:00:00 Thu Mar 8
> 08:56:33
> 12342 dipankar Deferred 1 5:00:00:00 Thu Mar 8
> 10:37:22
>
> Total Jobs: 24 Active Jobs: 22 Idle Jobs: 0 Blocked Jobs: 2
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> Here first two jobs showing INFINITY and jobs are not running. Even its
> not deleting . I login to compute nodes and i did top. Jobs are not running.
> when i check the job it showing,
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> checking job 12377
>
> State: Running
> Creds: user:prashant group:prashant class:batch qos:DEFAULT
> WallTime: 41:03:45:05 of 1:12:00:00
> SubmitTime: Sat Mar 10 13:13:50
> (Time Queued Total: 13:02:20:59 Eligible: 13:02:20:59)
>
> StartTime: Fri Mar 23 15:34:49
> Total Tasks: 1
>
> Req[0] TaskCount: 1 Partition: DEFAULT
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [NONE]
> NodeCount: 1
> Allocated Nodes:
> [node08:1]
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 2
> PartitionMask: [ALL]
> Flags: RESTARTABLE
>
> Reservation '12377' ( -INFINITY -> 00:00:01 Duration: 28:19:08:37)
> PE: 1.00 StartPriority: 18860
>
>
> Can you please help me how to sort it out.
>
> Velan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070421/8f3d0777/attachment.html
More information about the torqueusers
mailing list