[torqueusers] Re: Current Status showing INFINITY and job not deleting

Vadivelan Ranjith velan.aero at gmail.com
Fri Apr 20 23:18:56 MDT 2007


On 4/21/07, Vadivelan Ranjith <velan.aero at gmail.com> wrote:
>
> Hi
> some of our compute nodes went down due to power failure. We booted some
> nodes after few days. After booting nodes, i deleted all jobs manually using
> qdel in server. All jobs deleted except two jobs. when i type showq
> ACTIVE JOBS--------------------
> JOBNAME            USERNAME      STATE  PROC   REMAINING
> STARTTIME
>
> 12377              prashant    Running     1   -INFINITY  Fri Mar 23
> 15:34:49
> 12361              prashant    Running     1   -INFINITY  Fri Mar 23
> 15:34:49
> 12769                vilask    Running     1  1:08:57:44  Tue Apr 17
> 19:42:11
> 12775               dmashok    Running     1  1:10:46:14  Tue Apr 17
> 21:30:41
> 12777              shinisha    Running     1  1:10:48:18  Tue Apr 17
> 21:32:45
> 12778                 mehta    Running     1  1:10:51:55  Tue Apr 17
> 21:36:22
> 12779                 mehta    Running     1  1:10:51:55  Tue Apr 17
> 21:36:22
> 12789                 atuls    Running     1  1:21:59:27  Wed Apr 18
> 08:43:54
> 12790                 atuls    Running     1  1:21:59:58  Wed Apr 18
> 08:44:25
> 12791                 atuls    Running     1  1:22:00:29  Wed Apr 18
> 08:44:56
> 12796               sndatta    Running     1  2:01:59:11  Wed Apr 18
> 12:43:38
> 12768                 deepa    Running     1  2:02:23:59  Wed Apr 18
> 13:08:26
> 12803              dipankar    Running     1  2:22:35:34  Thu Apr 19
> 09:20:01
> 12804              dipankar    Running     1  2:22:45:54  Thu Apr 19
> 09:30:21
> 12805              shinisha    Running     1  2:23:00:22  Thu Apr 19
> 09:44:49
> 12806              mahendra    Running     1  2:23:30:20  Thu Apr 19
> 10:14:47
> 12816              mahendra    Running     1  3:05:38:12  Thu Apr 19
> 16:22:39
> 12838               dmashok    Running     1  4:00:31:31  Fri Apr 20
> 11:15:58
> 12839              shinisha    Running     1  4:01:04:04  Fri Apr 20
> 11:48:31
> 12851               dmashok    Running     1  4:11:04:12  Fri Apr 20
> 21:48:39
> 12849                vilask    Running     1  4:23:25:54  Sat Apr 21
> 10:10:21
> 12850                 deepa    Running     1  4:23:25:54  Sat Apr 21
> 10:10:21
>
>     22 Active Jobs      22 of   32 Processors Active ( 68.75%)
>                         14 of   16 Nodes Active      (87.50%)
>
> IDLE JOBS----------------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT
> QUEUETIME
>
>
> 0 Idle Jobs
>
> BLOCKED JOBS----------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT
> QUEUETIME
>
> 12333              mahendra   Deferred     1  5:00:00:00  Thu Mar  8
> 08:56:33
> 12342              dipankar   Deferred     1  5:00:00:00  Thu Mar  8
> 10:37:22
>
> Total Jobs: 24   Active Jobs: 22   Idle Jobs: 0   Blocked Jobs: 2
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> Here first two jobs showing INFINITY and jobs are not running. Even its
> not deleting . I login to compute nodes and i did top. Jobs are not running.
> when i check the job it showing,
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> checking job 12377
>
> State: Running
> Creds:  user:prashant  group:prashant  class:batch  qos:DEFAULT
> WallTime: 41:03:45:05 of 1:12:00:00
> SubmitTime: Sat Mar 10 13:13:50
>   (Time Queued  Total: 13:02:20:59  Eligible: 13:02:20:59)
>
> StartTime: Fri Mar 23 15:34:49
> Total Tasks: 1
>
> Req[0]  TaskCount: 1  Partition: DEFAULT
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> NodeCount: 1
> Allocated Nodes:
> [node08:1]
>
>
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 2
> PartitionMask: [ALL]
> Flags:       RESTARTABLE
>
> Reservation '12377' ( -INFINITY -> 00:00:01  Duration: 28:19:08:37)
> PE:  1.00  StartPriority:  18860
>
>
> Can you please help me how to sort it out.
>
> Velan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070421/8f3d0777/attachment.html


More information about the torqueusers mailing list