[Mauiusers] MSysRegEvent(JOBRESVIOLATION: job '850416' in state 'Running' has exceeded PROC resource limit (1618 > 100) (action CANCEL will be taken)

Sabuj Pattanayek sabujp at gmail.com
Tue Jan 5 14:27:27 MST 2010


> The ncpus limits is useful only for single-node jobs. If you ask -lncpus=32
> while your nodes only have 16 cores, your jobs will be rejected for sure.

yeah, it'll say this in the log if i do ncpus=17:

01/05 15:04:48 INFO:     0 feasible tasks found for job 850448:0 in
partition DEFAULT (17 Needed)
01/05 15:04:48 ALERT:    job 850448 cannot run in any partition
01/05 15:04:48 ALERT:    cannot create new reservation for job 850448
(shape[1] 17)
01/05 15:04:48 ALERT:    cannot create new reservation for job 850448
01/05 15:04:48 MJobSetHold(850448,16,1:00:00,NoResources,cannot create
reservation for job '850448' (intital reservation attempt)
)
01/05 15:04:48 ALERT:    job '850448' cannot run (deferring job for
3600 seconds)
01/05 15:04:48 WARNING:  cannot reserve priority job '850448'

> But before giving up, make sure that you have "JOBNODEMATCHPOLICY EXACTNODE"
> in maui.cfg.

yeah that was set.

> I think that you will get nowhere with ncpus on a cluster, unless you use
> these limits only for single-node jobs.

I've got mpiexec 0.83 setup and working with mvapich2 across multiple
nodes, but indeed ncpus and PROC limiting doesn't work across nodes,
i.e. it only calculates the load of the job's processes on a single
node even if it is running on multiple nodes. It would be nice if
ncpus could be used across nodes, e.g.:

#PBS -l nodes=2:ppn=12,ncpus=25

Which says my processes will generate no more than a load of 25. I
guess I'll have to disable PROC limiting since people will be running
mpich jobs across nodes.


More information about the mauiusers mailing list