[torqueusers] pbs_sched crashes when user does not state resources

Vincent David v-david at northwestern.edu
Mon Nov 2 18:10:30 MST 2009


The scheduler pbs_sched crashes with following message, when user does  
not define the resources he needs:

Nov  2 17:23:44 zille kernel: pbs_sched[14569] general protection rip: 
4058f8 rsp:7fffa1528e48 error:0

A very simple script to produce this error is the following:

 >   cd $PBS_O_WORKDIR
 >   uname -a
 >   date

Executed with following command:

qsub -q default -t 1-200 script.pbs

Adding a resource list like:

qsub -l nodes=1:ppn=1 -q default -t 1-200 script.pbs

solves the issue. I think this issue is critical, because simple abuse  
of a user can crash the entire scheduler. The problem arises on our  
server with torque-2.3.3, but I thought it would be still reasonable  
to describe, since I couldn't find any reports of this bug.

I also think, that it might have to do with our configuration, since  
it seems such a fundamental issue. The configuration is quite simple:

Max open servers: 4
Qmgr: print server zille
#
# Create queues and set their attributes.
#
#
# Create and define queue long_runs
#
create queue long_runs
set queue long_runs queue_type = Execution
set queue long_runs Priority = 0
set queue long_runs enabled = True
set queue long_runs started = True
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default Priority = 10
set queue default resources_default.walltime = 24:00:00
set queue default enabled = True
set queue default started = True
#
# Create and define queue long_runs2
#
create queue long_runs2
set queue long_runs2 queue_type = Execution
set queue long_runs2 Priority = 0
set queue long_runs2 enabled = True
set queue long_runs2 started = True
#
# Create and define queue matlab
#
create queue matlab
set queue matlab queue_type = Execution
set queue matlab Priority = 10
set queue matlab max_running = 8
set queue matlab resources_default.ncpus = 1
set queue matlab resources_default.nodes = 1
set queue matlab enabled = True
set queue matlab started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server acl_hosts += zille
set server acl_user_enable = False
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server allow_node_submit = False
set server next_job_number = 8365
Qmgr:

Any help apreciated!

Vincent David

Northwestern University
MCC Eng Sci & Applied Math
2145 Sheridan Road  M426
Evanston  IL 60208-3125



More information about the torqueusers mailing list