[torqueusers] have enough nodes,but job is not running

Jozef Káčer quickparser at gmail.com
Wed Apr 16 06:41:29 MDT 2008


I often find myself in situations, in which jobs should have enough
resources and
should be running. I submit jobs using PBS script. Nevertheless, if the job
gets
hung in queue for a longer time I try force them to run using "runjob" or
"qrun". It
usually works provided that there are enough free resources available.

Jozef

2008/4/16 <pat.o'bryant at exxonmobil.com <pat.o%27bryant at exxonmobil.com>>:

>
> Zhyang,
>    Here is something you might try. Code up a Torque "job_script" with the
> following "#PBS" control cards. Note that "#PBS" control cards can take
> the
> place of command line arguments and they follow the same format.   Submit
> the job using "qsub job_script". If you specify ppn > (number of
> cpus/node),  Maui (for some paramter settings) will look for a matching
> node with that number of cpus minimum. So for example, if you use "#PBS -l
> nodes=8:ppn=4", Maui will look for nodes with 4 cpus. If it can't find a
> node like that,  the job will remain queued. The thing to keep in mind is
> that Torque queues your job and Maui (in your case) actually decides where
> and when your job will execute. Most execution problems will be due to
> Maui/Moab parameter settings. Here are some links to check as well:
>
> http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission
> http://www.clusterresources.com/products/mwm/docs/a.fparameters.shtml
>
> Contents of "job_script"
> ----------------------------------
> #!/bin/bash
> #PBS -N Short
> #PBS -l nodes=8:ppn=2,walltime=00:02:00
> pwd
> hostname
>
> End of "job_script"
> ---------------------------
>
> Thanks,
>  Pat
>
> J.W. (Pat) O'Bryant,Jr.
> Business Line Infrastructure
> Technical Systems, HPC
> Office: 713-431-7022
>
>
>
>
>              zhyang at lzu.edu
>             .cn
>                                                                        To
>                                       pat.o'bryant at exxonmobil.com<pat.o%27bryant at exxonmobil.com>
>              04/15/08 07:19                                             cc
>             AM                       torqueusers at supercluster.org
>                                                                   Subject
>                                      Re: Re: [torqueusers] have enough
>                                       nodes,but job is not running
>
>
>
>
>
>
>
>
>
>
>
> Hi pat
>
> I am not use the pbs control cards. I have 56 nodes, 2 cpu per node.
>
>
> >-----ԭʼÓʼþ-----
> > ·¢¼þÈË: pat.o'bryant at exxonmobil.com <pat.o%27bryant at exxonmobil.com>
> > ·¢ËÍʱ¼ä: 2008-04-15 20:09:27
> > ÊÕ¼þÈË: zhyang at lzu.edu.cn
> > ³­ËÍ:
> > Ö÷Ìâ: Re: [torqueusers] have enough nodes,but job is not running
> > Zhyang,
> >
> >      What do your #PBS control cards look like? Also, how many cpus/node
> do
> >
> > you have?
> >
> >                  Thanks,
> >
> >                   Pat
> >
> >
> >
> >
> >
> > J.W. (Pat) O'Bryant,Jr.
> >
> > Business Line Infrastructure
> >
> > Technical Systems, HPC
> >
> > Office: 713-431-7022
> >
> >
> >
> >
> >
> >
> > Hi
> >
> >  I have a cluster include 56 nodes, and install torque and maui, but
> >
> > recently I found that when I use showq show 34 nodes active, user submit
> 5
> >
> > nodes job, the job status is Q and not running,from showq result ,it
> should
> >
> > have enough nodes(at leaat 5 nodes),but why the job not running?
> >
> > I submit 2 nodes job ,job running is ok. who can help me ? Thanks!
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > _______________________________________________
> >
> > torqueusers mailing list
> >
> > torqueusers at supercluster.org
> >
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> >
> >
>
> --     ´ËÖÂ
>                 ¾´Àñ
>                 ÕÅÑó
>    À¼ÖÝ´óѧͨÐÅÍøÂçÖÐÐÄ
>    µØÖ·£ºÖйú¸ÊËàÀ¼ÖÝÌìˮ·222ºÅ
>    µç»°£º£¨0931£©8912011    ´«Õ棺£¨0931£©8912022    ÓÊ
> ±à£º730000   Email£ºzhyang at lzu.edu.cn
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080416/6063b85f/attachment-0001.html


More information about the torqueusers mailing list