[torqueusers] Broken Scheduling Puzzle
soubari at yahoo.com
Fri Feb 1 00:41:07 MST 2013
Correction, the stuck jobs stay in a W state and some were submitted earlier in the same day. Also, when I compare output of "qstat -f" for one job before and after being stuck, it shows these lines were added:
> Resource_List.neednodes = 1
> euser = rpt_prod
> egroup = rpt
> queue_rank = 102241
> queue_type = E
Thank you, Sam.
> From: Sam Oubari <soubari at yahoo.com>
>To: "torqueusers at supercluster.org" <torqueusers at supercluster.org>
>Sent: Thursday, January 31, 2013 8:24 PM
>Subject: Broken Scheduling Puzzle
>I have one but very busy PROD PBS 2.5.11 on x86-64 server using a local pbs_sched and all components and clients are running on the same server.
>Currently, if I have jobs from the day before waiting for future exec date, and I qsub a new job with a future time, then qmove it to another queue and qalter it to a more distant future date, then some of the waiting jobs move to Q status at exec time but don't run. The moved job and all other jobs run as they should. All the jobs on this server are generally executing simple scripts. Normally, since 2.4.x, this problem shows up rarely and randomly, but for about a week now, I can re-produce on demand but not on my TEST server.
>Any ideas what I should try?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers