[torqueusers] jobs remain in Q state

Chris Vaughan chris at clusterresources.com
Tue Jan 22 04:33:02 MST 2008


Fernando,

Can you provide the output of qmgr -c 'p s', thanks.

Fernando Malick wrote:
> Hi, I'm new to torque, and I'm having a problem getting jobs done.
>
> I compiled and installed torque 2.2.1, followed configuration steps, 
> created a queue named "batch" in a server named "gandalf". Then I 
> wrote a very primitive script to send to the queue, and when I ask for 
> the queue stats I get this:
>
>
> qstat
> Job id                    Name             User            Time Use S 
> Queue
> ------------------------- ---------------- --------------- -------- - 
> -----
> 6.gandalf                 nada             root                   0 Q 
> batch
> 7.gandalf                 nada             root                   0 Q 
> batch
>
>
> I have pbs_server, pbs_mom and pbs_sched  running, but something is 
> happening that jobs remain in queue and don't get executed.
>
> If someone is able to guide me or give me a clue as to what is 
> happening, I will be very grateful.
> I'm including the logs,
> -----------------------------------------------------------------------------------------------------------------------
> server_logs:
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;Log;Log opened
> 01/21/2008 16:25:57;0006;PBS_Server;Svr;PBS_Server;Server 
> gandalf.xxx.yyy started, initialization type = 1
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;Act;Account file 
> /var/spool/torque/server_priv/accounting/20080121 opened
> 01/21/2008 16:25:57;0040;PBS_Server;Req;setup_nodes;setup_nodes()
> 01/21/2008 16:25:57;0086;PBS_Server;Svr;PBS_Server;Recovered queue batch
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Expected 1, 
> recovered 1 queues
> 01/21/2008 16:25:57;0100;PBS_Server;Job;7.gandalf.xxx.yyy;enqueuing 
> into batch, state 1 hop 1
> 01/21/2008 16:25:57;0086;PBS_Server;Job;7.gandalf.xxx.yyy;Requeueing 
> job, substate: 10 Requeued in queue: batch
> 01/21/2008 16:25:57;0100;PBS_Server;Job;6.gandalf.xxx.yyy;enqueuing 
> into batch, state 1 hop 1
> 01/21/2008 16:25:57;0086;PBS_Server;Job;6.gandalf.xxx.yyy;Requeueing 
> job, substate: 10 Requeued in queue: batch
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Expected 2, 
> recovered 2 jobs
> 01/21/2008 16:25:57;0006;PBS_Server;Svr;PBS_Server;Using ports 
> Server:15001  Scheduler:15004  MOM:15002
> 01/21/2008 16:25:57;0002;PBS_Server;Svr;PBS_Server;Server Ready, pid = 
> 6180, loglevel=0
> 01/21/2008 16:26:02;0040;PBS_Server;Req;ping_nodes;ping attempting to 
> contact 1 nodes
> 01/21/2008 16:26:15;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:26:15;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:31:12;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:31:12;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type StatusServer request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:31:38;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:32:21;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:32:21;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type StatusServer request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:33:35;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type StatusServer request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:07;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type StatusServer request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:13;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type StatusServer request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:34:59;0100;PBS_Server;Req;;Type SelStat request received 
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type StatusServer request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:35:07;0100;PBS_Server;Req;;Type SelStat request received 
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:35:27;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:35:27;0100;PBS_Server;Req;;Type StatusQueue request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:36:25;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:36:25;0100;PBS_Server;Req;;Type StatusServer request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:38:22;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:38:22;0100;PBS_Server;Req;;Type ModifyJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:38:25;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:38:25;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:13;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:13;0100;PBS_Server;Req;;Type Manager request received 
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:13;0004;PBS_Server;Que;batch;attributes set:  at 
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:41:13;0004;PBS_Server;Que;batch;attributes set: enabled 
> = TRUE
> 01/21/2008 16:41:15;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:15;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:25;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:25;0100;PBS_Server;Req;;Type Manager request received 
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:25;0004;PBS_Server;Que;batch;attributes set:  at 
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:41:25;0004;PBS_Server;Que;batch;attributes set: enabled 
> = TRUE
> 01/21/2008 16:41:27;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:27;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:57;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:41:57;0100;PBS_Server;Req;;Type Manager request received 
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:41:57;0004;PBS_Server;Que;batch;attributes set:  at 
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:41:57;0004;PBS_Server;Que;batch;attributes set: started 
> = TRUE
> 01/21/2008 16:42:00;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:00;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:06;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:06;0100;PBS_Server;Req;;Type Manager request received 
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:06;0004;PBS_Server;Que;batch;attributes set:  at 
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:42:06;0004;PBS_Server;Que;batch;attributes set: started 
> = TRUE
> 01/21/2008 16:42:09;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:09;0100;PBS_Server;Req;;Type Manager request received 
> from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:09;0004;PBS_Server;Que;batch;attributes set:  at 
> request of root at gandalf.xxx.yyy
> 01/21/2008 16:42:09;0004;PBS_Server;Que;batch;attributes set: started 
> = TRUE
> 01/21/2008 16:42:13;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:13;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:46;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:46;0100;PBS_Server;Req;;Type OrderJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:48;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:48;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:42:59;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:42:59;0100;PBS_Server;Req;;Type OrderJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:43:00;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx..yyy, sock=10
> 01/21/2008 16:43:00;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 16:51:55;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 16:51:55;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
> 01/21/2008 17:17:57;0100;PBS_Server;Req;;Type AuthenticateUser request 
> received from root at gandalf.xxx.yyy, sock=10
> 01/21/2008 17:17:57;0100;PBS_Server;Req;;Type StatusJob request 
> received from root at gandalf.xxx.yyy, sock=9
>
> ----------------------------------------------------------------------------------------------------------------
> mom_logs:
>
> 01/21/2008 16:25:08;0002;   pbs_mom;Svr;Log;Log opened
> 01/21/2008 16:25:08;0002;   pbs_mom;Svr;setpbsserver;gandalf.xxx.yyy
> 01/21/2008 16:25:08;0002;   pbs_mom;Svr;setpbsserver;server 
> gandalf.xxx.yyy added
> 01/21/2008 16:25:08;0002;   pbs_mom;n/a;initialize;independent
> 01/21/2008 16:25:08;0080;   pbs_mom;Svr;pbs_mom;before init_abort_jobs
> 01/21/2008 16:25:08;0002;   pbs_mom;Svr;pbs_mom;Is up
> 01/21/2008 16:25:08;0002;   pbs_mom;Svr;mom_main;MOM executable path 
> and mtime at launch: /usr/local/sbin/pbs_mom 1200328428
> 01/21/2008 16:25:08;0002;   pbs_mom;n/a;mom_main;hello sent to server 
> gandalf.xxx.yyy
> 01/21/2008 16:25:54;0002;   pbs_mom;Svr;im_eof;End of File from addr 
> www.xxx.yyy.zzz:15001 <http://www.xxx.yyy.zzz:15001>
> 01/21/2008 16:25:54;0002;   pbs_mom;n/a;mom_main;hello sent to server 
> gandalf.xxx.yyy
>
> ------------------------------------------------------------------------------------------------------------------
> sched_logs:
> 01/21/2008 16:25:35;0002; pbs_sched;Svr;Log;Log opened
> 01/21/2008 16:25:35;0002; pbs_sched;Svr;TokenAct;Account file 
> /var/spool/torque/sched_priv/accounting/20080121 opened
> 01/21/2008 16:25:35;0002; pbs_sched;Svr;main;pbs_sched startup pid 6175
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> Yahoo! Encuentros
> Ahora encontrar pareja es mucho más fácil, probá el nuevo Yahoo! 
> Encuentros.
> Visitá http://yahoo.cupidovirtual.com/servlet/NewRegistration
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>   


-- 
Chris Vaughan
EMEA Systems Engineer
Cluster Resources, Ltd.
Direct - UK Office:  +44 (0)1223 437 132
Mobile - +44 (0)7800 973 062
US Headquarters:  +1 801 717 3700
Skype: supercomputer1
www.clusterresources.co.uk
-- 

Evaluate Our Products, Free 45-Day Evaluation
http://www.clusterresources.com/pages/products/evaluate.php



More information about the torqueusers mailing list