[torqueusers] problems with openMP
SCIPIONI Roberto
SCIPIONI.Roberto at nims.go.jp
Sun Nov 16 20:46:44 MST 2008
Dear all,
I recently restored my /home directory in my cluster that was damaged
and it looks like the openMP jobs submitted with Torque do not work while the standard LAM-MPI do
the error is
[slavenode2:11511] [NO-NAME] ORTE_ERROR_LOG: Error in file runtime/orte_universe_exists.c at line 299
[slavenode2:11511] orte_init: could not contact the specified universe name default-universe-11511
[slavenode2:11511] [NO-NAME] ORTE_ERROR_LOG: Unreachable in file runtime/orte_init_stage1.c at line 221
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_sds_base_contact_universe failed
--> Returned value -12 instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[slavenode2:11511] [NO-NAME] ORTE_ERROR_LOG: Unreachable in file runtime/orte_system_init.c at line 42
[slavenode2:11511] [NO-NAME] ORTE_ERROR_LOG: Unreachable in file runtime/orte_init.c at line 52
--------------------------------------------------------------------------
Open RTE was unable to initialize properly. The error occured while
attempting to orte_init(). Returned value -12 instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
~
~
I read somewhere that it could be due to the /tmp directory not being clean
How do I purge the previous not properly finished jobs with open MPI ?
Thanks
RS
More information about the torqueusers
mailing list