[torqueusers] can't hold the job using qhold

Vadivelan Ranjith achillesvelan at yahoo.co.in
Mon Oct 23 08:09:21 MDT 2006


Hi
I am using torque-2.1.0. I tried to hold the job(id
10439). But job is running. its not holding. i checked
the server_logs. Its showing job 10362 is request to
delete(I dont know how it got). But no job id 10362 is
running. I checked mom_logs in compute node. jobs are
running finely. but i want to hold some jobs. Early it
was worked. i dont know what i changed. I cant able to
figure it out. can you help me how to fix it.

Thanks
Velan

----------------------------------------------------------------
server_logs
----------------------------------------------------------------
10/23/2006 19:24:15;0100;PBS_Server;Req;;Type
AuthenticateUser request received from
root at galaxy.aero.iitb.ac.in, sock=77
10/23/2006 19:24:15;0100;PBS_Server;Req;;Type HoldJob
request received from root at galaxy.aero.iitb.ac.in,
sock=11
10/23/2006
19:24:15;0008;PBS_Server;Job;10439.galaxy.aero.iitb.ac.in;Holds
u set at request of root at galaxy.aero.iitb.ac.in
10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
StatusNode request received from
root at galaxy.aero.iitb.ac.in, sock=75
10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
StatusQueue request received from
root at galaxy.aero.iitb.ac.in, sock=75
10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
StatusJob request received from
root at galaxy.aero.iitb.ac.in, sock=75
10/23/2006 19:24:37;0100;PBS_Server;Req;;Type
DeleteJob request received from
root at galaxy.aero.iitb.ac.in, sock=75
10/23/2006
19:24:37;0008;PBS_Server;Job;10362.galaxy.aero.iitb.ac.in;Job
deleted at request of root at galaxy.aero.iitb.ac.in
10/23/2006 19:24:37;0001;PBS_Server;Req;;Server could
not connect to MOM
10/23/2006
19:24:37;0080;PBS_Server;Req;req_reject;Reject reply
code=15070(Server could not connect to MOM), aux=0,
type=DeleteJob, from root at galaxy.aero.iitb.ac.in
10/23/2006
19:24:37;0008;PBS_Server;Job;10362.galaxy.aero.iitb.ac.in;Job
sent signal SIGTERM on delete


---------------------------------------------------------------
mom_logs
---------------------------------------------------------------
10/23/2006 15:10:59;0002;   pbs_mom;Svr;Log;Log opened
10/23/2006 15:10:59;0080;  
pbs_mom;Job;10413.galaxy.aero.iitb.ac.in;scan_for_terminated:
job 10413.galaxy.aero.iitb.ac.in task 1 terminated,
sid 2576
10/23/2006 15:10:59;0008;  
pbs_mom;Job;10413.galaxy.aero.iitb.ac.in;Terminated
10/23/2006 16:10:29;0008;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
killing pid 2565 task 1 with sig 15
10/23/2006 16:10:29;0008;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
killing pid 2603 task 1 with sig 15
10/23/2006 16:10:29;0008;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
killing pid 2606 task 1 with sig 15
10/23/2006 16:10:29;0008;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
killing pid 2607 task 1 with sig 15
10/23/2006 16:10:29;0008;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
not killing pid 0 with sig 9
10/23/2006 16:10:30;0008;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;kill_task:
killing pid 2606 task 1 with sig 9
10/23/2006 16:10:30;0080;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;scan_for_terminated:
job 10412.galaxy.aero.iitb.ac.in task 1 terminated,
sid 2565
10/23/2006 16:10:30;0008;  
pbs_mom;Job;10412.galaxy.aero.iitb.ac.in;Terminated
10/23/2006 16:10:57;0001;  
pbs_mom;Job;TMomFinalizeJob3;job
10444.galaxy.aero.iitb.ac.in started, pid = 24856
10/23/2006 16:10:57;0008;  
pbs_mom;Job;10444.galaxy.aero.iitb.ac.in;Job Modified
at request of PBS_Server at master1.cluster2.iitb.ac.in
10/23/2006 16:33:10;0008;  
pbs_mom;Job;10439.galaxy.aero.iitb.ac.in;JOIN JOB as
node 1
----------------------------------------------------------

qmgr -c 'p s'
----------------------------------------------------------

#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_max.nodes = 4
set queue batch resources_max.walltime = 120:00:00
set queue batch resources_default.nodes = 0
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Create and define queue short
#
create queue short
set queue short queue_type = Execution
set queue short resources_max.nodes = 1
set queue short resources_max.walltime = 24:00:00
set queue short enabled = True
set queue short started = True
#
# Set server attributes.
#
set server scheduling = False
set server managers = root at galaxy.aero.iitb.ac.in
set server operators = root at galaxy.aero.iitb.ac.in
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_ping_rate = 300
set server node_check_rate = 600
set server tcp_timeout = 6
set server job_stat_rate = 30
set server pbs_version = 2.1.0p0


		
__________________________________________________________
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/


More information about the torqueusers mailing list