[Mauiusers] setting busy status in funciotn of disk space and
cpuload
Arnau Bria
arnaubria at pic.es
Tue Sep 23 02:52:28 MDT 2008
On Mon, 22 Sep 2008 12:29:02 -0500
Tom Rudwick wrote:
Hi Tom,
> We have this in our mom configs:
> size [fs=/tmp]
yep, something similar (/home instead of /tmp).
> This makes the size resource hold the space available in /tmp
>
> Then we do:
>
> qsub -l ddisk=5gb
>
> Or you could set a default for ddisk for a queue or server.
Exactly.
> It should only run on nodes with the proper amount of free
> space.
Sure, but what happens if I don't define that resource on some nodes,
cause we have some nodes with big disks and other with not.
> I'm sure you can do this other ways, but I don't have
> experience with those. Hopefully you can adapt this to what
> you need.
I was trying to define a dynamic rsource. A simple script that
returns free disk space in nodes with small disks, and big constant
value on nodes with big disks... But I don't know how to use it,
cause I send a job requesting some big space (that no nodes have), and
the job is always scheduled when it shouldn't. I have also modified the
dynamic resource so now it's a boolean value (if it has enough
space=1, if not, =0), but, again, if all nodes have the resource=0, and
I submit a job requesting that resource, the job is scheduled.
Following:
http://www.clusterresources.com/torquedocs21/a.cmomconfig.shtml
[root at pbs02 ~]# pbsnodes -a|grep -c espacio
122
[root at pbs02 ~]# pbsnodes -a|grep -c "espacio:0"
122
So, all nodes have the resource=0.
I submit a job:
[arnaubria at ui01 ~]$ echo sleep 5|qsub -l other=espacio -q short
560171.pbs02.pic.es
[arnaubria at ui01 ~]$ qstat -f 560171.pbs02.pic.es
Job Id: 560171.pbs02.pic.es
Job_Name = STDIN
Job_Owner = arnaubria at ui01.pic.es
job_state = Q
queue = short
server = pbs02.pic.es
Checkpoint = u
ctime = Tue Sep 23 10:49:10 2008
Error_Path = ui01.pic.es:/nfs/pic.es/user/a/arnaubria/STDIN.e560171
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Tue Sep 23 10:49:10 2008
Output_Path = ui01.pic.es:/nfs/pic.es/user/a/arnaubria/STDIN.o560171
Priority = 0
qtime = Tue Sep 23 10:49:10 2008
Rerunable = True
Resource_List.cput = 01:30:00
Resource_List.other = espacio
Resource_List.walltime = 03:00:00
Variable_List = PBS_O_HOME=/nfs/pic.es/user/a/arnaubria,
PBS_O_LANG=en_US.UTF-8,PBS_O_LOGNAME=arnaubria,
PBS_O_PATH=/usr/kerberos/bin:/opt/glite/bin:/opt/glite/externals/bin:
/opt/lcg/bin:/opt/lcg/sbin:/opt/edg/bin:/opt/edg/sbin:/opt/globus/sbin
:/opt/globus/bin:/opt/gpt/sbin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6
/bin:/opt/d-cache//srm/bin:/opt/d-cache//dcap/bin:/usr/java/jdk1.5.0_1
4/bin:/nfs/pic.es/user/a/arnaubria/bin,
PBS_O_MAIL=/var/spool/mail/arnaubria,PBS_O_SHELL=/bin/bash,
PBS_O_HOST=ui01.pic.es,PBS_O_WORKDIR=/nfs/pic.es/user/a/arnaubria,
PBS_O_QUEUE=short
etime = Tue Sep 23 10:49:10 2008
submit_args = -l other=espacio -q short
[root at pbs02 ~]# qstat 560171
qstat: Unknown Job Id 560171.pbs02.pic.es
[root at pbs02 ~]# checkjob 560171
checking job 560171
State: Running
Creds: user:arnaubria group:grid class:short qos:DEFAULT
WallTime: 00:00:00 of 3:00:00
SubmitTime: Tue Sep 23 10:49:10
(Time Queued Total: 00:01:46 Eligible: 00:01:46)
StartTime: Tue Sep 23 10:50:56
Total Tasks: 1
Req[0] TaskCount: 1 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [slc4]
Allocated Nodes:
[td060.pic.es:1]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
Flags: BACKFILL RESTARTABLE
Reservation '560171' (00:00:00 -> 3:00:00 Duration: 3:00:00)
PE: 1.00 StartPriority: 0
....
It shouldn't start....
If I do it requesting file=10000kb, as you, it works, but then, I lose
the chance of specifying diff directory at worke node level.
> Tom
Cheers,
Arnau
More information about the mauiusers
mailing list