[torqueusers] jobs remain in Q state
Chris Vaughan
chris at clusterresources.com
Tue Jan 22 04:33:02 MST 2008
Fernando,
Can you provide the output of qmgr -c 'p s', thanks.
Fernando Malick wrote:
> Hi, I'm new to torque, and I'm having a problem getting jobs done.
>
> I compiled and installed torque 2.2.1, followed configuration steps,
> created a queue named "batch" in a server named "gandalf". Then I
> wrote a very primitive script to send to the queue, and when I ask for
> the queue stats I get this:
>
>
> qstat
> Job id Name User Time Use S
> Queue
> ------------------------- ---------------- --------------- -------- -
> -----
> 6.gandalf nada root 0 Q
> batch
> 7.gandalf nada root 0 Q
> batch
>
>
> I have pbs_server, pbs_mom and pbs_sched running, but something is
> happening that jobs remain in queue and don't get executed.
>
> If someone is able to guide me or give me a clue as to what is
> happening, I will be very grateful.
> I'm including the logs,
> -----------------------------------------------------------------------------------------------------------------------
> server_logs:
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;Log;Log opened
> 01/21/2008 16:25:57;0006;PBS_Server;Svr;PBS_Server;Server
> gandalf.xxx.yyy started, initialization type = 1
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;Act;Account file
> /var/spool/torque/server_priv/accounting/20080121 opened
> 01/21/2008 16:25:57;0040;PBS_Server;Req;setup_nodes;setup_nodes()
> 01/21/2008 16:25:57;0086;PBS_Server;Svr;PBS_Server;Recovered queue batch
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Expected 1,
> recovered 1 queues
> 01/21/2008 16:25:57;0100;PBS_Server;Job;7.gandalf.xxx.yyy;enqueuing
> into batch, state 1 hop 1
> 01/21/2008 16:25:57;0086;PBS_Server;Job;7.gandalf.xxx.yyy;Requeueing
> job, substate: 10 Requeued in queue: batch
> 01/21/2008 16:25:57;0100;PBS_Server;Job;6.gandalf.xxx.yyy;enqueuing
> into batch, state 1 hop 1
> 01/21/2008 16:25:57;0086;PBS_Server;Job;6.gandalf.xxx.yyy;Requeueing
> job, substate: 10 Requeued in queue: batch
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Expected 2,
> recovered 2 jobs
> 01/21/2008 16:25:57;0006;PBS_Server;Svr;PBS_Server;Using ports
> Server:15001 Scheduler:15004 MOM:15002
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid =
> 6180, loglevel=0
> 01/21/2008 16:26:02;0040;PBS_Server;Req;ping_nodes;ping attempting to
> contact 1 nodes
> 01/21/2008 16:26:15;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:26:15;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:31:12;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:31:12;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type StatusServer request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:32:21;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:32:21;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type StatusServer request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type StatusServer request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type StatusServer request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type StatusServer request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type SelStat request received
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type StatusServer request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type SelStat request received
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:35:27;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:35:27;0100;PBS_Server;Req;;Type StatusQueue request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:36:25;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:36:25;0100;PBS_Server;Req;;Type StatusServer request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:38:22;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:38:22;0100;PBS_Server;Req;;Type ModifyJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:38:25;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:38:25;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:13;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:13;0100;PBS_Server;Req;;Type Manager request received
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:13;0004;PBS_Server;Que;batch;attributes set: at
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:41:13;0004;PBS_Server;Que;batch;attributes set: enabled
> = TRUE
> 01/21/2008 16:41:15;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:15;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:25;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:25;0100;PBS_Server;Req;;Type Manager request received
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:25;0004;PBS_Server;Que;batch;attributes set: at
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:41:25;0004;PBS_Server;Que;batch;attributes set: enabled
> = TRUE
> 01/21/2008 16:41:27;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:27;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:57;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:57;0100;PBS_Server;Req;;Type Manager request received
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:57;0004;PBS_Server;Que;batch;attributes set: at
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:41:57;0004;PBS_Server;Que;batch;attributes set: started
> = TRUE
> 01/21/2008 16:42:00;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:00;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:06;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:06;0100;PBS_Server;Req;;Type Manager request received
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:06;0004;PBS_Server;Que;batch;attributes set: at
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:42:06;0004;PBS_Server;Que;batch;attributes set: started
> = TRUE
> 01/21/2008 16:42:09;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:09;0100;PBS_Server;Req;;Type Manager request received
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:09;0004;PBS_Server;Que;batch;attributes set: at
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:42:09;0004;PBS_Server;Que;batch;attributes set: started
> = TRUE
> 01/21/2008 16:42:13;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:13;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:46;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:46;0100;PBS_Server;Req;;Type OrderJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:48;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:48;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:59;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:59;0100;PBS_Server;Req;;Type OrderJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:43:00;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx..yyy, sock=10
> 01/21/2008 16:43:00;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:51:55;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:51:55;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 17:17:57;0100;PBS_Server;Req;;Type AuthenticateUser request
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 17:17:57;0100;PBS_Server;Req;;Type StatusJob request
> received from root at gandalf.xxx.yyy, sock=9
>
> ----------------------------------------------------------------------------------------------------------------
> mom_logs:
>
> 01/21/2008 16:25:08;0002; pbs_mom;Svr;Log;Log opened
> 01/21/2008 16:25:08;0002; pbs_mom;Svr;setpbsserver;gandalf.xxx.yyy
> 01/21/2008 16:25:08;0002; pbs_mom;Svr;setpbsserver;server
> gandalf.xxx.yyy added
> 01/21/2008 16:25:08;0002; pbs_mom;n/a;initialize;independent
> 01/21/2008 16:25:08;0080; pbs_mom;Svr;pbs_mom;before init_abort_jobs
> 01/21/2008 16:25:08;0002; pbs_mom;Svr;pbs_mom;Is up
> 01/21/2008 16:25:08;0002; pbs_mom;Svr;mom_main;MOM executable path
> and mtime at launch: /usr/local/sbin/pbs_mom 1200328428
> 01/21/2008 16:25:08;0002; pbs_mom;n/a;mom_main;hello sent to server
> gandalf.xxx.yyy
> 01/21/2008 16:25:54;0002; pbs_mom;Svr;im_eof;End of File from addr
> www.xxx.yyy.zzz:15001 <http://www.xxx.yyy.zzz:15001>
> 01/21/2008 16:25:54;0002; pbs_mom;n/a;mom_main;hello sent to server
> gandalf.xxx.yyy
>
> ------------------------------------------------------------------------------------------------------------------
> sched_logs:
> 01/21/2008 16:25:35;0002; pbs_sched;Svr;Log;Log opened
> 01/21/2008 16:25:35;0002; pbs_sched;Svr;TokenAct;Account file
> /var/spool/torque/sched_priv/accounting/20080121 opened
> 01/21/2008 16:25:35;0002; pbs_sched;Svr;main;pbs_sched startup pid 6175
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> Yahoo! Encuentros
> Ahora encontrar pareja es mucho más fácil, probá el nuevo Yahoo!
> Encuentros.
> Visitá http://yahoo.cupidovirtual.com/servlet/NewRegistration
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
--
Chris Vaughan
EMEA Systems Engineer
Cluster Resources, Ltd.
Direct - UK Office: +44 (0)1223 437 132
Mobile - +44 (0)7800 973 062
US Headquarters: +1 801 717 3700
Skype: supercomputer1
www.clusterresources.co.uk
--
Evaluate Our Products, Free 45-Day Evaluation
http://www.clusterresources.com/pages/products/evaluate.php
More information about the torqueusers
mailing list