[torqueusers] -l file not working properly? (Torque 2.0.0p5, Maui 3.2.6p14)

Mike Renfro renfro at tntech.edu
Thu Jun 1 14:24:12 MDT 2006


I have a very disk-intensive user that bought a 300GB drive to put in 
one of our nodes. In an attempt to steer his jobs to the system I 
installed the drive into, I'm testing out jobs with the "-l file" 
directive. It's not working when I request several GB of disk space, an 
far less than the available space reported by checknode. Further testing 
shows that the breaking point from jobs running and aborting is between 
3gb and 4gb. Any ideas?

 From the user's perspective:

=====

abcdefghij21 at ch208a:~$ qsub -I -l file=100gb
qsub: waiting for job 1230.ch208a.cae.tntech.edu to start
qsub: job 1230.ch208a.cae.tntech.edu ready


qsub: job 1230.ch208a.cae.tntech.edu completed
abcdefghij21 at ch208a:~$ qsub -I -l file=1gb
qsub: waiting for job 1231.ch208a.cae.tntech.edu to start
qsub: job 1231.ch208a.cae.tntech.edu ready

abcdefghij21 at ch226-5:~$ logout

qsub: job 1231.ch208a.cae.tntech.edu completed
abcdefghij21 at ch208a:~$

=====

The server_logs entries for the failing job:

=====
06/01/2006 15:04:21;0100;PBS_Server;Req;;Type Commit request received 
from abcdefghij21 at ch208a.cae.tntech.edu, sock=10
06/01/2006 
15:04:21;0100;PBS_Server;Job;1230.ch208a.cae.tntech.edu;enqueuing into 
long, state 1 hop 1
06/01/2006 
15:04:21;0100;PBS_Server;Job;1230.ch208a.cae.tntech.edu;dequeuing from 
long, state QUEUED
06/01/2006 
15:04:21;0100;PBS_Server;Job;1230.ch208a.cae.tntech.edu;enqueuing into 
pe2650, state 1 hop 1
06/01/2006 15:04:21;0008;PBS_Server;Job;1230.ch208a.cae.tntech.edu;Job 
Queued at request of abcdefghij21 at ch208a.cae.tntech.edu, owner = 
abcdefghij21 at ch208a.cae.tntech.edu, job name = STDIN, queue = pe2650
06/01/2006 15:04:21;0040;PBS_Server;Svr;ch208a.cae.tntech.edu;Scheduler 
sent command new
06/01/2006 15:04:22;0100;PBS_Server;Req;;Type StatusNode request 
received from root at ch208a.cae.tntech.edu, sock=9
06/01/2006 15:04:22;0100;PBS_Server;Req;;Type StatusQueue request 
received from root at ch208a.cae.tntech.edu, sock=9
06/01/2006 15:04:22;0100;PBS_Server;Req;;Type StatusJob request received 
from root at ch208a.cae.tntech.edu, sock=9
06/01/2006 15:04:22;0100;PBS_Server;Req;;Type ModifyJob request received 
from root at ch208a.cae.tntech.edu, sock=9
06/01/2006 15:04:22;0008;PBS_Server;Job;1230.ch208a.cae.tntech.edu;Job 
Modified at request of root at ch208a.cae.tntech.edu
06/01/2006 15:04:22;0100;PBS_Server;Req;;Type RunJob request received 
from root at ch208a.cae.tntech.edu, sock=9
06/01/2006 15:04:22;0008;PBS_Server;Job;1230.ch208a.cae.tntech.edu;Job 
Run at request of root at ch208a.cae.tntech.edu
06/01/2006 15:04:22;0100;PBS_Server;Req;;Type JobObituary request 
received from pbs_mom at ch226-5.cae.tntech.edu, sock=11
06/01/2006 
15:04:22;0010;PBS_Server;Job;1230.ch208a.cae.tntech.edu;Exit_status=-2 
resources_used.cput=00:00:00 resources_used.mem=0kb 
resources_used.vmem=0kb resources_used.walltime=00:00:00
06/01/2006 
15:04:22;0100;PBS_Server;Job;1230.ch208a.cae.tntech.edu;dequeuing from 
pe2650, state COMPLETE
06/01/2006 15:04:22;0040;PBS_Server;Svr;ch208a.cae.tntech.edu;Scheduler 
sent command term

=====

And the checknode output on the nodes in that queue (ch226-5 has the 
300GB drive installed):

=====

checking node ch226-1

State:      Idle  (in current state for 00:36:57)
Configured Resources: PROCS: 2  MEM: 1011M  SWAP: 1674M  DISK: 16G
Utilized   Resources: DISK: 701M
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [mb1024]
Attributes: [Batch]
Classes:    [pe2650 2:2][long 2:2][short 2:2]

Total Time:   INFINITY  Up:   INFINITY (100.00%)  Active: 18:08:53:23 
(14.38%)

Reservations:
NOTE:  no reservations on node

checking node ch226-2

State:      Idle  (in current state for 1:12:26)
Configured Resources: PROCS: 2  MEM: 1011M  SWAP: 1663M  DISK: 16G
Utilized   Resources: DISK: 368M
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [mb1024]
Attributes: [Batch]
Classes:    [pe2650 2:2][long 2:2][short 2:2]

Total Time:   INFINITY  Up:   INFINITY (99.96%)  Active: 30:02:12:18 
(23.56%)

Reservations:
NOTE:  no reservations on node

checking node ch226-3

State:      Idle  (in current state for 00:36:13)
Configured Resources: PROCS: 2  MEM: 1011M  SWAP: 1663M  DISK: 16G
Utilized   Resources: DISK: 1186M
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [mb1024]
Attributes: [Batch]
Classes:    [pe2650 2:2][long 2:2][short 2:2]

Total Time:   INFINITY  Up:   INFINITY (92.81%)  Active: 38:18:17:21 
(30.35%)

Reservations:
NOTE:  no reservations on node

checking node ch226-4

State:      Idle  (in current state for 1:12:26)
Configured Resources: PROCS: 2  MEM: 1011M  SWAP: 1817M  DISK: 16G
Utilized   Resources: DISK: 1464M
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [mb1024]
Attributes: [Batch]
Classes:    [pe2650 2:2][long 2:2][short 2:2]

Total Time:   INFINITY  Up:   INFINITY (99.94%)  Active: 65:23:26:27 
(51.65%)

Reservations:
NOTE:  no reservations on node

checking node ch226-5

State:      Idle  (in current state for 00:00:02)
Configured Resources: PROCS: 2  MEM: 1011M  SWAP: 1942M  DISK: 275G
Utilized   Resources: DISK: 32M
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.000
Network:    [DEFAULT]
Features:   [mb1024]
Attributes: [Batch]
Classes:    [pe2650 2:2][long 2:2][short 2:2]

Total Time:   INFINITY  Up:   INFINITY (99.96%)  Active: 71:00:25:43 
(55.60%)

Reservations:
NOTE:  no reservations on node

=====

-- 
Mike Renfro  / R&D Engineer, Center for Manufacturing Research,
931 372-3601 / Tennessee Technological University -- renfro at tntech.edu


More information about the torqueusers mailing list