[torqueusers] Queing jobs with qsub stays stated 'Q'
David Rivera Zapata
drivera at ciencias.udea.edu.co
Wed Apr 10 11:41:48 MDT 2013
Hello everyone, I'm new in the list so as a Torque user.
I have several issues with Torque. I've installed Torque on a Cluster that runs SL 6.3,
the pbs_server and pbs_mom seems to work fine, pbsnodes returns all nodes pretty well,
all of them are free; there's 6 nodes counting the head of the cluster.
But the issue i believe is with the pbs_sched. When I submit a job with qsub, it remains
stated 'Q', unless i force it to run with qrun. Another issue is that torque won't
respond after a while, if i try to get the status of the daemon (/etc/init.d/pbs_server
status) it doesn't responds, and thats for all commands qsub, qdel, qstat, pbsnodes, all
of them seems sleep. And a third issue is that when i force run a job for instance a
'Hello World' in C with MPI, the file .eXX shows me an error that says 'mpirun : command
I've got an iptables firewall with the torque needed ports openned.
I'd be very thankful if someone helps me with one of these issues.
Here some info:
The errors throwed when i try qsub on a script and the pbs_server is sleep as i said
parse_daemon_response error 15085 Time out
parse_daemon_response error 15033 Batch protocol error
$qmgr -c 'p s'
# Create queues and set their attributes.
# Create and define queue batch
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
# Set server attributes.
set server scheduling = True
set server acl_hosts = carmac.udea.edu.co
set server managers = root at carmac.udea.edu.co
set server operators = root at carmac.udea.edu.co
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 4
set server moab_array_compatible = True
Instituto de Matemáticas
Universidad de Antioquia
More information about the torqueusers