[torquedev] Cpuset behavior in TORQUE 3.0
dbeer at adaptivecomputing.com
Fri Jul 2 11:16:47 MDT 2010
The current behavior for cpusets (and this has been the case from the beginning) is that when they fail, they fail silently. There is no checking, and the only form of notification is found in the log file. Nothing is done to verify that the correct number of cpus are configured, etc.
The current behavior is documented, but not necessarily the most desirable. There are possibilities like making sure that the actual number of cpus is the number that the user has configured when the server is compiled with cpusets enabled, or making sure /dev/cpuset is actually mounted for the moms that have cpusets enabled, etc. It just seems that if someone is compiling TORQUE specifically to use cpusets, TORQUE shouldn't let them think that cpusets are working when they aren't.
There are some limits to this - linux will check the cpuset content written and will delete incorrect files without notifying anyone. This happens mostly when the incorrect number of cores are configured, or other misinformation. We can't stop linux from deleting the files, but we can avoid writing the incorrect information by making sure the configuration is correct.
In this discussion, it would be great if the original developer of TORQUE cpusets (I don't actually know who that is) could weigh in on the matter, and give us any special reasons why changing the behavior would be harmful.
Thanks for your input,
David Beer | Senior Software Engineer
More information about the torquedev