Bugzilla – Bug 195
cpuset VFS path change for 3.x kernels
Last modified: 2013-07-23 14:38:09 MDT
You need to
before you can comment on or make changes to this bug.
Created an attachment (id=107) [details]
support for new cpuset filenames
this is a continuation of the following problem:
I have the very same problem on Gentoo with 3.2.14 vanilla kernel and
torque-3.0.5, but a solution above doesn't help.
Any job fails to run because pbs_mom is unable to create a cpuset for
a job, pbs_mom.log:
FALSE 05/01/2012 04:09:11;0001; pbs_mom;Job;TMomFinalizeJob3;job
not started, Retry job exec failure, retry will be attempted (see
syslog for more information) 05/01/2012 04:09:11;0001;
pbs_mom;Job;5.master;ALERT: job failed phase 3 start 05/01/2012
04:09:11;0008; pbs_mom;Req;send_sisters;sending ABORT to sisters
for job 5.master 05/01/2012 04:09:11;0080;
pbs_mom;Svr;preobit_reply;top of preobit_reply 05/01/2012
top of while loop 05/01/2012 04:09:11;0080;
pbs_mom;Svr;preobit_reply;in while loop, no error from job stat
05/01/2012 04:09:11;0080; pbs_mom;Job;5.master;obit sent to server
05/01/2012 04:09:12;0080; pbs_mom;Job;5.master;removed job script
And in syslog:
May 01 04:09:11 [pbs_mom] LOG_ERROR::TMomFinalizeChild, Could not
create cpuset for job 5.master
/sys/fs/cgroup/cpuset and /dev/cpuset are both mounted as cpuset
$ mount | egrep "cpuset|cgroup"
cgroup_root on /sys/fs/cgroup type tmpfs
on /sys/fs/cgroup/openrc type cgroup
none on /dev/cpuset type cpuset (rw)
- on /sys/fs/cgroup/cpuset type cpuset (rw)
And their content is the same with "cpuset." prefix.
It looks like this change was made in 3.0 kernel, at least in works on
2.6.38 and fails on 3.2.14 kernel. Kernel's Documentation/cgroups/cpuset.txt
since kernel 3.0.y says that "cpuset." prefix must be used.
I wrote a patch to account path changes depending on the linux kernel
version. I verified that with this patch tasks are running and CPU
restrictions are enforced by the sceduler.
Can you do an ls of your /dev/cpuset mount please ?
I've just had a look with the 3.2 kernel on my Ubuntu laptop and when I do:
mount -t cpuset - /dev/cpuset
oot@eris:~# ls -1 /dev/cpuset
and the usual routine of:
echo 0-1 > cpus
echo 0 > mems
echo $$ > tasks
all works, which is basically all that Torque does.
If you run mount, you'll see your cpuset vfs is mounted with the noprefix
option. The "modern way" is to mount -t cgroup -o cpuset in which case
you'll end up with the "cpuset." prefix on cpuset attributes.
But if you want Torque to work unmodified you shouldn't do that. :-)
Breaking userspace is a bad thing so the noprefix behaviour is unlikely to go
away - here's a rant from Linus back in March on his attitude to breaking user
I do not use noprefix option, thus ls shows "cpuset." prefixes.
There is no such thing as a stable kernel API and there are good reasons for
this. New applications will eventually use modern way of handling things, so
torque should adapt as well otherwise conflicts will occur sooner or later.
Anyway if you plan to stick to old file names at least for a while, please put
somewhere in the documentation, that people should use -o noprefix.
(In reply to comment #4)
> I do not use noprefix option, thus ls shows "cpuset." prefixes.
Neither do I, and it ls does not show "cpuset." prefixes. The reason is that
you already have a cgroup filesystem mounted and I do not.
This change in behaviour is since the Linux kernel commit
f9ab5b5b0f5be506640321d710b0acd3dca6154a "cgroups: forbid noprefix if mounting
more than just cpuset subsystem".
I'll try and find some time to report this as a kernel regression to see what
their attitude to this is - to me it seems like the sort of ABI behaviour
change and consequent user space breakage that Linus hates.
> There is no such thing as a stable kernel API and there are good reasons for
You are mistaking the *internal* kernel APIs (which are indeed unstable for
very good reason) with the external kernel ABIs exposed to user space and which
have different rules applied.
There has been an attempt to document the level of stability of interfaces in
Documentation/ABI directory (see the README for Greg-KH's reasoning), but as
far as I can tell the cpuset/cgroup stuff has not been added yet.
> New applications will eventually use modern way of handling things, so
> torque should adapt as well otherwise conflicts will occur sooner or later.
Agreed, but Torque will need to know to cope with both cases dynamically.
> Anyway if you plan to stick to old file names at least for a while, please put
> somewhere in the documentation, that people should use -o noprefix.
Sounds like a good idea, I've just tested that on a RHEL5 system and it didn't
complain about not knowing what that meant.
I'm testing this on a RHEL6 system, and I can't seem to get the cpuset file
system to mount without the prefixes:
# mount |grep cpuset
# mount |grep cgroup
# mount -t cgroup -o cpuset,noprefix none /dev/cpuset
# ls /dev/cpuset|grep cpus
# mount|grep cpuset
none on /dev/cpuset type cgroup (rw,cpuset,noprefix)
# uname -r
I'm not sure if I'm doing something wrong, or if my kernel just doesn't
understand 'noprefix'. Either way, I think TORQUE should support both
The proposed patch looks for a specific kernel version, but clearly RedHat has
backported cgroups making that check incorrect.
(In reply to comment #6)
+1 for being annoyed that they'd break user applications. I don't know why
things like this are done.
> I'm not sure if I'm doing something wrong, or if my kernel just doesn't
> understand 'noprefix'. Either way, I think TORQUE should support both
> The proposed patch looks for a specific kernel version, but clearly RedHat has
> backported cgroups making that check incorrect.
We may well need to make this patch lightly more sophisticated to work in all
cases but it is a good patch. I wonder if hwloc already handles this or not?
Does anyone know if this is broken for the 4 series? I assume it is but since
we use hwloc they might solve it for us - anyone can wish, right?
From Adaptive's perspective we will want to fix this just to avoid the support
calls we'd have to field for not fixing it.
I also meant to say - we hope to support cgroups at some point so that's
another reason to allow this.