[torqueusers] Only half of the nodes working in practice

Michel Herquet mherquet at fyma.ucl.ac.be
Wed Sep 6 00:23:14 MDT 2006


I need your help for this:

I'm working on a project where thousands of jobs are sent in a short time on 
the cluster. Our current configuration is

        server_state = Scheduling
        scheduling = True
        total_jobs = 9083
        state_count = Transit:0 Queued:9072 Held:0 Waiting:0 Running:11 
Exiting:0
        default_queue = batch
        log_events = 4
        mail_from = adm
        resources_available.nodect = 24
        resources_assigned.nodect = 11
        scheduler_iteration = 330
        node_ping_rate = 180
        node_check_rate = 300
        tcp_timeout = 30
        default_node = 1
        node_pack = False
        job_stat_rate = 120
        poll_jobs = True
        pbs_version = 2.1.0p0

and for the queue

        queue_type = Execution
        total_jobs = 9085
        state_count = Transit:0 Queued:9075 Held:0 Waiting:0 Running:10 
Exiting:0
        resources_default.nodes = 1
        resources_assigned.nodect = 10
        enabled = True
        started = True

We are using the standard torque scheduler. I do not understand this:
        resources_available.nodect = 24
        resources_assigned.nodect = 11

At the beginning a large fraction of the nodes are working but after a few 
minutes, only approx 10 on 24 are assigned for still running jobs.

Where am I wrong ?

Thanks a lot in advance for answers,

Michel, Torque newbie
 


More information about the torqueusers mailing list