Bugzilla – Bug 195
cpuset VFS path change for 3.x kernels
Last modified: 2012-05-24 21:48:57 MDT
You need to log in before you can comment on or make changes to this bug.
Created an attachment (id=107) [details] support for new cpuset filenames Hello, this is a continuation of the following problem: http://www.clusterresources.com/pipermail/torqueusers/2012-March/014336.html I have the very same problem on Gentoo with 3.2.14 vanilla kernel and torque-3.0.5, but a solution above doesn't help. Any job fails to run because pbs_mom is unable to create a cpuset for a job, pbs_mom.log: 05/01/2012 04:09:11;0001; pbs_mom;Svr;pbs_mom;LOG_DEBUG::mom_checkpoint_job_has_checkpoint, FALSE 05/01/2012 04:09:11;0001; pbs_mom;Job;TMomFinalizeJob3;job not started, Retry job exec failure, retry will be attempted (see syslog for more information) 05/01/2012 04:09:11;0001; pbs_mom;Job;5.master;ALERT: job failed phase 3 start 05/01/2012 04:09:11;0008; pbs_mom;Req;send_sisters;sending ABORT to sisters for job 5.master 05/01/2012 04:09:11;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply 05/01/2012 04:09:11;0080; pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop 05/01/2012 04:09:11;0080; pbs_mom;Svr;preobit_reply;in while loop, no error from job stat 05/01/2012 04:09:11;0080; pbs_mom;Job;5.master;obit sent to server 05/01/2012 04:09:12;0080; pbs_mom;Job;5.master;removed job script And in syslog: May 01 04:09:11 [pbs_mom] LOG_ERROR::TMomFinalizeChild, Could not create cpuset for job 5.master /sys/fs/cgroup/cpuset and /dev/cpuset are both mounted as cpuset filesystem type: $ mount | egrep "cpuset|cgroup" cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755) openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib64/rc/sh/cgroup-release-agent.sh,name=openrc) none on /dev/cpuset type cpuset (rw) - on /sys/fs/cgroup/cpuset type cpuset (rw) And their content is the same with "cpuset." prefix. It looks like this change was made in 3.0 kernel, at least in works on 2.6.38 and fails on 3.2.14 kernel. Kernel's Documentation/cgroups/cpuset.txt since kernel 3.0.y says that "cpuset." prefix must be used. I wrote a patch to account path changes depending on the linux kernel version. I verified that with this patch tasks are running and CPU restrictions are enforced by the sceduler.
Can you do an ls of your /dev/cpuset mount please ? I've just had a look with the 3.2 kernel on my Ubuntu laptop and when I do: mkdir /dev/cpuset mount -t cpuset - /dev/cpuset I see: oot@eris:~# ls -1 /dev/cpuset cgroup.clone_children cgroup.event_control cgroup.procs cpu_exclusive cpus mem_exclusive mem_hardwall memory_migrate memory_pressure memory_pressure_enabled memory_spread_page memory_spread_slab mems notify_on_release release_agent sched_load_balance sched_relax_domain_level tasks and the usual routine of: mkdir foo cd foo echo 0-1 > cpus echo 0 > mems echo $$ > tasks all works, which is basically all that Torque does.
Chris, If you run mount, you'll see your cpuset vfs is mounted with the noprefix option. The "modern way" is to mount -t cgroup -o cpuset in which case you'll end up with the "cpuset." prefix on cpuset attributes. David
Hi David, But if you want Torque to work unmodified you shouldn't do that. :-) Breaking userspace is a bad thing so the noprefix behaviour is unlikely to go away - here's a rant from Linus back in March on his attitude to breaking user apps.. https://lkml.org/lkml/2012/3/8/495
I do not use noprefix option, thus ls shows "cpuset." prefixes. There is no such thing as a stable kernel API and there are good reasons for this. New applications will eventually use modern way of handling things, so torque should adapt as well otherwise conflicts will occur sooner or later. Anyway if you plan to stick to old file names at least for a while, please put somewhere in the documentation, that people should use -o noprefix.
(In reply to comment #4) > I do not use noprefix option, thus ls shows "cpuset." prefixes. Neither do I, and it ls does not show "cpuset." prefixes. The reason is that you already have a cgroup filesystem mounted and I do not. This change in behaviour is since the Linux kernel commit f9ab5b5b0f5be506640321d710b0acd3dca6154a "cgroups: forbid noprefix if mounting more than just cpuset subsystem". I'll try and find some time to report this as a kernel regression to see what their attitude to this is - to me it seems like the sort of ABI behaviour change and consequent user space breakage that Linus hates. > There is no such thing as a stable kernel API and there are good reasons for > this. You are mistaking the *internal* kernel APIs (which are indeed unstable for very good reason) with the external kernel ABIs exposed to user space and which have different rules applied. There has been an attempt to document the level of stability of interfaces in Documentation/ABI directory (see the README for Greg-KH's reasoning), but as far as I can tell the cpuset/cgroup stuff has not been added yet. > New applications will eventually use modern way of handling things, so > torque should adapt as well otherwise conflicts will occur sooner or later. Agreed, but Torque will need to know to cope with both cases dynamically. > Anyway if you plan to stick to old file names at least for a while, please put > somewhere in the documentation, that people should use -o noprefix. Sounds like a good idea, I've just tested that on a RHEL5 system and it didn't complain about not knowing what that meant.