[torqueusers] jobs remain in Q state

Chris Vaughan chris at clusterresources.com
Thu Jan 24 06:08:57 MST 2008


Fernando,

Looking at qmgr I notice that you don't have a default queue setup, when 
I take that out of my qmgr settings jobs fail to run.  Try qmgr -c 'set 
server default_queue=batch' and try submitting a job again.  Could you 
post the output of tracejob as well.

Regards,


Fernando Malick wrote:
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue batch
> #
> qmgr -c 'p s'
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_max.walltime = 01:00:00
> set queue batch resources_default.nodes = 1
> set queue batch resources_default.walltime = 00:01:00
> set queue batch enabled = True
> set queue batch started = True
> #
> # Set server attributes.
> #
> set server acl_roots = root@*
> set server managers = root at gandalf.xx.yy.zz
> set server operators = root at gandalf.xx.yy.zz
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server pbs_version = 2.2.1
>
>
> ----- Mensaje original ----
> De: Chris Vaughan <chris at clusterresources.com>
> Para: Fernando Malick <fmalick at yahoo.com.ar>
> CC: torqueusers at supercluster.org
> Enviado: martes 22 de enero de 2008, 9:33:02
> Asunto: Re: [torqueusers] jobs remain in Q state
>
> Fernando,
>
> Can you provide the output of qmgr -c 'p s', thanks.
>
> Fernando Malick wrote:
> > Hi, I'm new to torque, and I'm having a problem getting jobs done.
> >
> > I compiled and installed torque 2.2.1, followed configuration steps,
> > created a queue named "batch" in a server named "gandalf". Then I
> > wrote a very primitive script to send to the queue, and when I ask for
> > the queue stats I get this:
> >
> >
> > qstat
> > Job id                    Name            User            Time Use S
> > Queue
> > ------------------------- ---------------- --------------- -------- -
> > -----
> > 6.gandalf                nada            root                  0 Q
> > batch
> > 7.gandalf                nada            root                  0 Q
> > batch
> >
> >
> > I have pbs_server, pbs_mom and pbs_sched  running, but something is
> > happening that jobs remain in queue and don't get executed.
> >
> > If someone is able to guide me or give me a clue as to what is
> > happening, I will be very grateful.
> > I'm including the logs,
> > 
> -----------------------------------------------------------------------------------------------------------------------
> > server_logs:
> > 01/21/2008 16:25:57;0002;PBS_Server;Svr;Log;Log opened
> > 01/21/2008 16:25:57;0006;PBS_Server;Svr;PBS_Server;Server
> > gandalf.xxx.yyy started, initialization type = 1
> > 01/21/2008 16:25:57;0002;PBS_Server;Svr;Act;Account file
> > /var/spool/torque/server_priv/accounting/20080121 opened
> > 01/21/2008 16:25:57;0040;PBS_Server;Req;setup_nodes;setup_nodes()
> > 01/21/2008 16:25:57;0086;PBS_Server;Svr;PBS_Server;Recovered queue batch
> > 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Expected 1,
> > recovered 1 queues
> > 01/21/2008 16:25:57;0100;PBS_Server;Job;7.gandalf.xxx.yyy;enqueuing
> > into batch, state 1 hop 1
> > 01/21/2008 16:25:57;0086;PBS_Server;Job;7.gandalf.xxx.yyy;Requeueing
> > job, substate: 10 Requeued in queue: batch
> > 01/21/2008 16:25:57;0100;PBS_Server;Job;6.gandalf.xxx.yyy;enqueuing
> > into batch, state 1 hop 1
> > 01/21/2008 16:25:57;0086;PBS_Server;Job;6.gandalf.xxx.yyy;Requeueing
> > job, substate: 10 Requeued in queue: batch
> > 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Expected 2,
> > recovered 2 jobs
> > 01/21/2008 16:25:57;0006;PBS_Server;Svr;PBS_Server;Using ports
> > Server:15001  Scheduler:15004  MOM:15002
> > 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid =
> > 6180, loglevel=0
> > 01/21/2008 16:26:02;0040;PBS_Server;Req;ping_nodes;ping attempting to
> > contact 1 nodes
> > 01/21/2008 16:26:15;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:26:15;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:31:12;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:31:12;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type StatusServer request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:32:21;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx..yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:32:21;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type StatusServer request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type StatusServer request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type StatusServer request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type StatusServer request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type SelStat request received
> > from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type StatusServer request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type SelStat request received
> > from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:35:27;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:35:27;0100;PBS_Server;Req;;Type StatusQueue request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:36:25;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:36:25;0100;PBS_Server;Req;;Type StatusServer request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:38:22;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:38:22;0100;PBS_Server;Req;;Type ModifyJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:38:25;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:38:25;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:41:13;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:41:13;0100;PBS_Server;Req;;Type Manager request received
> > from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:41:13;0004;PBS_Server;Que;batch;attributes set:  at
> > request of root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>
> > 01/21/2008 16:41:13;0004;PBS_Server;Que;batch;attributes set: enabled
> > = TRUE
> > 01/21/2008 16:41:15;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:41:15;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:41:25;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:41:25;0100;PBS_Server;Req;;Type Manager request received
> > from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:41:25;0004;PBS_Server;Que;batch;attributes set:  at
> > request of root at gandalf.xxx..yyy <mailto:root at gandalf.xxx.yyy>
> > 01/21/2008 16:41:25;0004;PBS_Server;Que;batch;attributes set: enabled
> > = TRUE
> > 01/21/2008 16:41:27;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:41:27;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:41:57;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:41:57;0100;PBS_Server;Req;;Type Manager request received
> > from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:41:57;0004;PBS_Server;Que;batch;attributes set:  at
> > request of root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>
> > 01/21/2008 16:41:57;0004;PBS_Server;Que;batch;attributes set: started
> > = TRUE
> > 01/21/2008 16:42:00;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:42:00;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:42:06;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:42:06;0100;PBS_Server;Req;;Type Manager request received
> > from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:42:06;0004;PBS_Server;Que;batch;attributes set:  at
> > request of root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>
> > 01/21/2008 16:42:06;0004;PBS_Server;Que;batch;attributes set: started
> > = TRUE
> > 01/21/2008 16:42:09;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:42:09;0100;PBS_Server;Req;;Type Manager request received
> > from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:42:09;0004;PBS_Server;Que;batch;attributes set:  at
> > request of root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>
> > 01/21/2008 16:42:09;0004;PBS_Server;Que;batch;attributes set: started
> > = TRUE
> > 01/21/2008 16:42:13;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:42:13;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:42:46;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:42:46;0100;PBS_Server;Req;;Type OrderJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:42:48;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:42:48;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:42:59;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx..yyy>, 
> sock=10
> > 01/21/2008 16:42:59;0100;PBS_Server;Req;;Type OrderJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:43:00;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf..xxx..yyy <mailto:root at gandalf.xxx..yyy>, 
> sock=10
> > 01/21/2008 16:43:00;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 16:51:55;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 16:51:55;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> > 01/21/2008 17:17:57;0100;PBS_Server;Req;;Type AuthenticateUser request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, 
> sock=10
> > 01/21/2008 17:17:57;0100;PBS_Server;Req;;Type StatusJob request
> > received from root at gandalf.xxx.yyy <mailto:root at gandalf.xxx.yyy>, sock=9
> >
> > 
> ----------------------------------------------------------------------------------------------------------------
> > mom_logs:
> >
> > 01/21/2008 16:25:08;0002;  pbs_mom;Svr;Log;Log opened
> > 01/21/2008 16:25:08;0002;  pbs_mom;Svr;setpbsserver;gandalf.xxx.yyy
> > 01/21/2008 16:25:08;0002;  pbs_mom;Svr;setpbsserver;server
> > gandalf.xxx..yyy added
> > 01/21/2008 16:25:08;0002;  pbs_mom;n/a;initialize;independent
> > 01/21/2008 16:25:08;0080;  pbs_mom;Svr;pbs_mom;before init_abort_jobs
> > 01/21/2008 16:25:08;0002;  pbs_mom;Svr;pbs_mom;Is up
> > 01/21/2008 16:25:08;0002;  pbs_mom;Svr;mom_main;MOM executable path
> > and mtime at launch: /usr/local/sbin/pbs_mom 1200328428
> > 01/21/2008 16:25:08;0002;  pbs_mom;n/a;mom_main;hello sent to server
> > gandalf.xxx.yyy
> > 01/21/2008 16:25:54;0002;  pbs_mom;Svr;im_eof;End of File from addr
> > www.xxx.yyy.zzz:15001 <http://www.xxx.yyy.zzz:15001> 
> <http://www.xxx.yyy.zzz:15001>
> > 01/21/2008 16:25:54;0002;  pbs_mom;n/a;mom_main;hello sent to server
> > gandalf.xxx.yyy
> >
> > 
> ------------------------------------------------------------------------------------------------------------------
> > sched_logs:
> > 01/21/2008 16:25:35;0002; pbs_sched;Svr;Log;Log opened
> > 01/21/2008 16:25:35;0002; pbs_sched;Svr;TokenAct;Account file
> > /var/spool/torque/sched_priv/accounting/20080121 opened
> > 01/21/2008 16:25:35;0002; pbs_sched;Svr;main;pbs_sched startup pid 6175
> >
> >
> >
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > Yahoo! Encuentros
> > Ahora encontrar pareja es mucho más fácil, probá el nuevo Yahoo!
> > Encuentros.
> > Visitá http://yahoo.cupidovirtual.com/servlet/NewRegistration
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> > 
>
>
> -- 
> Chris Vaughan
> EMEA Systems Engineer
> Cluster Resources, Ltd.
> Direct - UK Office:  +44 (0)1223 437 132
> Mobile - +44 (0)7800 973 062
> US Headquarters:  +1 801 717 3700
> Skype: supercomputer1
> www.clusterresources.co.uk <http://www.clusterresources.co.uk>
> -- 
>
> Evaluate Our Products, Free 45-Day Evaluation
> http://www.clusterresources.com/pages/products/evaluate.php 
> <http://www..clusterresources.com/pages/products/evaluate.php>
>
>
>
> ------------------------------------------------------------------------
>
> Yahoo! Encuentros
> Ahora encontrar pareja es mucho más fácil, probá el nuevo Yahoo! 
> Encuentros.
> Visitá http://yahoo.cupidovirtual.com/servlet/NewRegistration


-- 
Chris Vaughan
EMEA Systems Engineer
Cluster Resources, Ltd.
Direct - UK Office:  +44 (0)1223 437 132
Mobile - +44 (0)7800 973 062
US Headquarters:  +1 801 717 3700
Skype: supercomputer1
www.clusterresources.co.uk
-- 

Evaluate Our Products, Free 45-Day Evaluation
http://www.clusterresources.com/pages/products/evaluate.php



More information about the torqueusers mailing list