[torqueusers] PBS_Server: LOG_ERROR
Abraham Zamudio
abraham.zamudio at gmail.com
Mon Apr 5 14:04:56 MDT 2010
Tail /var/log/messages :
PBS_Server: LOG_ERROR::wait_request, connection 9 to host 168430808 has
timed out after 900 seconds - closing stale connection
My configuration of queue is :
#
# Create queues and set their attributes.
#
#
# Create and define queue paralela
#
create queue paralela
set queue paralela queue_type = Execution
set queue paralela enabled = True
set queue paralela started = True
#
# Create and define queue serial
#
create queue serial
set queue serial queue_type = Execution
set queue serial resources_default.nodes = 1
set queue serial resources_default.walltime = 01:00:00
set queue serial enabled = True
set queue serial started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = master
set server managers = mpiX at master
set server default_queue = serial
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 9
Nodes :
Master : master
slaves : quad2
quad4
On master using qsub command : qsub run.sh
pbsnodes -a :
quad2
state = free
np = 3
ntype = cluster
status = opsys=linux,uname=Linux quad2 2.6.31.12-174.2.22.fc12.x86_64
#1 SMP Fri Feb 19 18:55:03 UTC 2010 x86_64,sessions=1405 1512 1486 1563 1570
1582 1602 1616 1647 1753 1755 1777 1799 1832 2012 2043 2044 1681 2540 2550
2632 2633 2647 2666 2674 2691 2927
2955,nsessions=28,nusers=5,idletime=92,totmem=20445056kb,availmem=14304140kb,physmem=4058764kb,ncpus=4,loadave=2.47,netload=41652360053,state=free,jobs=,varattr=,rectime=1270498149
quad4
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux quad4 2.6.31.12-174.2.3.fc12.x86_64 #1
SMP Mon Jan 18 19:52:07 UTC 2010 x86_64,sessions=1542 1678 1682 1683 1707
1729 8220
9493,nsessions=8,nusers=3,idletime=443582,totmem=55461328kb,availmem=54327508kb,physmem=24745056kb,ncpus=8,loadave=0.00,netload=1666289018,state=free,jobs=,varattr=,rectime=1270498149
--
Abraham Zamudio Ch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100405/87bd5330/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run.sh
Type: application/x-sh
Size: 417 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20100405/87bd5330/attachment-0001.sh
More information about the torqueusers
mailing list