[torqueusers] Re: can't hold the job using qhold

Vadivelan Ranjith achillesvelan at yahoo.co.in
Mon Oct 23 11:23:14 MDT 2006


--- garrick at clusterresources.com wrote:

> On Mon, Oct 23, 2006 at 03:09:21PM +0100, Vadivelan
> Ranjith alleged:
> > Hi
> > I am using torque-2.1.0. I tried to hold the
> job(id
> > 10439). But job is running. its not holding. i
> checked
> > the server_logs. Its showing job 10362 is request
> to
> > delete(I dont know how it got). But no job id
> 10362 is
> > running. I checked mom_logs in compute node. jobs
> are
> > running finely. but i want to hold some jobs.
> Early it
> > was worked. i dont know what i changed. I cant
> able to
> > figure it out. can you help me how to fix it.
> 
> Is the pbs_mom daemon down?  It looks like
> pbs_server can't contact it.
pbs_mom is running. I am running parallel computing.
so i think without pbs_mom i wont run.

Thanks
Velan
> > 
> >
>
----------------------------------------------------------------
> > server_logs
> >
>
----------------------------------------------------------------
> > 10/23/2006 19:24:15;0100;PBS_Server;Req;;Type
> > AuthenticateUser request received from
> > root at galaxy.aero.iitb.ac.in, sock=77
> > 10/23/2006 19:24:15;0100;PBS_Server;Req;;Type
> HoldJob
> > request received from root at galaxy.aero.iitb.ac.in,
> > sock=11
> > 10/23/2006
> >
>
19:24:15;0008;PBS_Server;Job;10439.galaxy.aero.iitb.ac.in;Holds
> > u set at request of root at galaxy.aero.iitb.ac.in
> > 10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
> > StatusNode request received from
> > root at galaxy.aero.iitb.ac.in, sock=75
> > 10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
> > StatusQueue request received from
> > root at galaxy.aero.iitb.ac.in, sock=75
> > 10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
> > StatusJob request received from
> > root at galaxy.aero.iitb.ac.in, sock=75
> > 10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
> > DeleteJob request received from
> > root at galaxy.aero.iitb.ac.in, sock=75
> > 10/23/2006
> >
>
19:24:37;0008;PBS_Server;Job;10362.galaxy.aero.iitb.ac.in;Job
> > deleted at request of root at galaxy.aero.iitb.ac.in
> > 10/23/2006 19:24:37;0001;PBS_Server;Req;;Server
> could
> > not connect to MOM
> > 10/23/2006
> > 19:24:37;0080;PBS_Server;Req;req_reject;Reject
> reply
> > code=15070(Server could not connect to MOM),
> aux=0,
> > type=DeleteJob, from root at galaxy.aero.iitb.ac.in
> > 10/23/2006
> >
>
19:24:37;0008;PBS_Server;Job;10362.galaxy.aero.iitb.ac.in;Job
> > sent signal SIGTERM on delete
> > 
> > 
> >
>
---------------------------------------------------------------
> > mom_logs
> >
>
---------------------------------------------------------------
> > 10/23/2006 15:10:59;0002;   pbs_mom;Svr;Log;Log
> opened
> > 10/23/2006 15:10:59;0080;  
> >
>
pbs_mom;Job;10413.galaxy.aero.iitb.ac.in;scan_for_terminated:
> > job 10413.galaxy.aero.iitb.ac.in task 1
> terminated,
> > sid 2576
> > 10/23/2006 15:10:59;0008;  
> >
> pbs_mom;Job;10413.galaxy.aero.iitb.ac.in;Terminated
> > 10/23/2006 16:10:29;0008;  
> >
> pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
> > killing pid 2565 task 1 with sig 15
> > 10/23/2006 16:10:29;0008;  
> >
> pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
> > killing pid 2603 task 1 with sig 15
> > 10/23/2006 16:10:29;0008;  
> >
> pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
> > killing pid 2606 task 1 with sig 15
> > 10/23/2006 16:10:29;0008;  
> >
> pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
> > killing pid 2607 task 1 with sig 15
> > 10/23/2006 16:10:29;0008;  
> >
> pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
> > not killing pid 0 with sig 9
> > 10/23/2006 16:10:30;0008;  
> >
> pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
> > killing pid 2606 task 1 with sig 9
> > 10/23/2006 16:10:30;0080;  
> >
>
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;scan_for_terminated:
> > job 10412.galaxy.aero.iitb.ac.in task 1
> terminated,
> > sid 2565
> > 10/23/2006 16:10:30;0008;  
> >
> pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;Terminated
> > 10/23/2006 16:10:57;0001;  
> > pbs_mom;Job;TMomFinalizeJob3;job
> > 10444.galaxy.aero.iitb.ac.in started, pid = 24856
> > 10/23/2006 16:10:57;0008;  
> > pbs_mom;Job;10444.galaxy.aero.iitb.ac.in;Job
> Modified
> > at request of
> PBS_Server at master1.cluster2.iitb.ac.in
> > 10/23/2006 16:33:10;0008;  
> > pbs_mom;Job;10439.galaxy.aero.iitb.ac.in;JOIN JOB
> as
> > node 1
> >
>
----------------------------------------------------------
> > 
> > qmgr -c 'p s'
> >
>
----------------------------------------------------------
> > 
> > #
> > # Create queues and set their attributes.
> > #
> > #
> > # Create and define queue batch
> > #
> > create queue batch
> > set queue batch queue_type = Execution
> > set queue batch resources_max.nodes = 4
> > set queue batch resources_max.walltime = 120:00:00
> > set queue batch resources_default.nodes = 0
> > set queue batch resources_default.walltime =
> 01:00:00
> > set queue batch enabled = True
> > set queue batch started = True
> > #
> > # Create and define queue short
> > #
> > create queue short
> > set queue short queue_type = Execution
> > set queue short resources_max.nodes = 1
> > set queue short resources_max.walltime = 24:00:00
> > set queue short enabled = True
> > set queue short started = True
> > #
> > # Set server attributes.
> > #
> > set server scheduling = False
> > set server managers = root at galaxy.aero.iitb.ac.in
> > set server operators = root at galaxy.aero.iitb.ac.in
> > set server default_queue = batch
> > set server log_events = 511
> > set server mail_from = adm
> > set server query_other_jobs = True
> > set server scheduler_iteration = 600
> > set server node_ping_rate = 300
> > set server node_check_rate = 600
> > set server tcp_timeout = 6
> > set server job_stat_rate = 30
> > set server pbs_version = 2.1.0p0
> > 
> > 
> > 		
> >
>
__________________________________________________________
> > Yahoo! India Answers: Share what you know. Learn
> something new
> > http://in.answers.yahoo.com/
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
>
http://www.supercluster.org/mailman/listinfo/torqueusers
> 



		
__________________________________________________________
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/


More information about the torqueusers mailing list