[torqueusers] mpiexec errors

Gelonia L. Dent gdent at amnh.org
Mon Apr 27 10:48:25 MDT 2009


Error messages.

We are running Scyld Taskmaster and since a recent reboot of the headnode,
the following error persists when jobs are submitted to the scheduler,
launch then are rejected.

              mpiexec: Error: get_hosts: pbs_connect: no error.

Moab seems to be functioning properly

[root at enyo ~]# mdiag -S
Moab Server 'Scyld' running on enyo:42559  (Mode: NORMAL)
  Time(ms)  Sched: 0  RMLoad: 1  RMProcess: 0  RMAction: 0
            Triggers: 0  User: 0  Idle: 61077 Total: 61078
  Load(5m)  Sched: 0.00%  RMLoad: 0.00%  RMProcess: 0.00%  RMAction: 0.03%
            Triggers: 0.00%  User: 0.03%  Idle: 99.94%
  Load(24h) Sched: 0.00%  RMLoad: 0.00%  RMProcess: 0.00%  RMAction: 0.00%
            Triggers: 0.00%  User: 0.00%  Idle: 100.00%
  PollInterval: 00:01:00  (Avg Sched Interval: 00:00:57  Iterations: 1375)

  NOTE:  scheduler will restart in 1:57:57

  Message:  profiling enabled (50 of 50 samples/00:30:00 interval)

However,

[root at enyo ~]# momctl -d 3
ERROR:    query[0] 'diag3' failed on localhost (errno=0-Success:
5-Input/output error)


Does anyone know how to resolve this problem?







More information about the torqueusers mailing list