[torqueusers] Need help with numa/cpuset
francois.prudhomme at hotmail.fr
Wed Jun 12 13:42:37 MDT 2013
(sorry if this message appear 3 times... make many mistakes... :()
I'm asking for help to use 2 things on my torque cluster :- 1) Use of mom.layout for better use of my nodes- 2) Use of cpuset
Actually, to do this, i'm using the 4.1.6 branch and configure with theses options :--prefix=/usr--enable-syslog--disable-gui--with-sched=no--enable-nvidia-gpus--enable-numa-support--enable-cpuset--with-tcp-retry-limit=5
I'm using version 1.7.1 of hwloc on a Debian squeeze with 3.2.0-0.bpo.3-amd64 kernel
The only problem during make was :catch_child.c:1973: error: ‘sisters’ undeclared (first use in this function)catch_child.c:1973: error: (Each undeclared identifier is reported only oncecatch_child.c:1973: error: for each function it appears in.)
Don't know why there is a condition to declare this identifier between lines 1685-1689... i deleted the #ifndef/#endif to correct this problem.
I'm lauching all packages on a vm with 4 cpus and a very minimal config :create queue batchset queue batch queue_type = Executionset queue batch resources_default.ncpus = 1set queue batch resources_default.nodes = 1set queue batch enabled = Trueset queue batch started = Trueset server acl_hosts = test2set server default_queue = batchset server log_events = 511set server mail_from = admset server scheduler_iteration = 600set server node_check_rate = 150set server tcp_timeout = 300set server job_stat_rate = 45set server poll_jobs = Trueset server mom_job_sync = Trueset server next_job_number = 11set server moab_array_compatible = True
cat /var/spool/torque/server_priv/nodestest2 np=4 num_node_boards=1
(and maui for scheduling)
When all is lauched, its work... but hwloc don't do his job... if i launch a load generator as "stress" for 2 cpus (stress -t 120 -c 2) with a qsub requiring 1 cpus :- /dev/cpuset/torque/"jobid"/ is well created but cpus file is empty- A look with htop show a load on 2 cpus- /dev/cpuset/torque/cpus is empty- "lstopo --ps" don't show anything...
Maybe a configuration problem ? When i look at mom logs :06/12/2013 17:00:31;0002; pbs_mom.4754;Svr;pbs_mom;Torque Mom Version = 4.1.6, loglevel = 006/12/2013 17:00:36;0002; pbs_mom.4754;Svr;setup_program_environment;machine topology contains 0 memory nodes, 4 cpus06/12/2013 17:00:36;0002; pbs_mom.4754;node;read_layout_file;nodeboard 0: 1 NUMA nodes: 006/12/2013 17:00:36;0002; pbs_mom.4754;node;read_layout_file;Setting up this mom to function as 1 numa nodes06/12/2013 17:00:36;0002; pbs_mom.4754;node;setup_nodeboards;nodeboard 0: 0 cpus (), 1 mems (0)06/12/2013 17:00:36;0002; pbs_mom.4754;Svr;init_torque_cpuset;Init cpuset /dev/cpuset/torque06/12/2013 17:00:36;0002; pbs_mom.4754;Svr;init_torque_cpuset;setting cpus =06/12/2013 17:00:36;0002; pbs_mom.4754;Svr;init_torque_cpuset;setting mems = 0
Why "setting cpus" is empty ?
Tests with a "hwloc-bind core:0 -- stress -t 120 -c 2 &" working well.
Anyone have an idea ?
Many thanks in advance :)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers