[torqueusers] job schedule and run problem!

vanilla vanilla0111 at gmail.com
Thu Sep 20 03:17:33 MDT 2007


Hi ,all
I have some trouble in PBS job schedule,
I installed oscar5.0, with torque as pbs_server and maui as scheduler;
It seems that jobs are scheduled , except for some warnnings:
---------
09/20 18:18:33 WARNING:  cannot receive message within 5.000000 second
timeout (aborting)
09/20 18:18:33 ALERT:    cannot determine packet size
09/20 18:18:33 ALERT:    cannot read client packet
09/20 18:18:33 MSUDisconnect(S)
09/20 18:18:37 ServerProcessRequests()
09/20 18:18:37 INFO:     not rolling logs (8588160 < 10000000)
09/20 18:18:37 MResAdjust(NULL,0,0)
09/20 18:18:37 MStatInitializeActiveSysUsage()
09/20 18:18:37 MStatClearUsage([NONE],Active)
09/20 18:18:37 ServerUpdate()
09/20 18:18:37 MSysUpdateTime()
--------
but  the jobs are always in the Q or R state, never run, that is has no
output or error info;
following is  the pbs_mom log,
---------
09/20/2007 17:52:39;0002;   pbs_mom;n/a;mom_main;hello sent to server
oscar_server.oscardomain
09/20/2007 17:52:39;0001;   pbs_mom;Svr;is_request;duplicate connection from
192.168.190.1:1023 - closing original connection
09/20/2007 17:52:39;0001;   pbs_mom;Svr;pbs_mom;is_request, End of File from
192.168.190.1:1023
09/20/2007 17:52:59;0002;   pbs_mom;n/a;mom_main;hello sent to server
oscar_server.oscardomain
09/20/2007 17:58:05;0001;   pbs_mom;Job;TMomFinalizeJob3;job not started,
Failure job exec failure, after files staged, no retry
09/20/2007 17:58:05;0008;   pbs_mom;Req;send_sisters;sending ABORT to
sisters
09/20/2007 17:58:05;0001;   pbs_mom;Job;408.oscar_server.oscardomain;server
rejected job obit - 15008
09/20/2007 18:16:11;0001;   pbs_mom;Job;TMomFinalizeJob3;job not started,
Failure job exec failure, after files staged, no retry
09/20/2007 18:16:11;0008;   pbs_mom;Req;send_sisters;sending ABORT to
sisters
09/20/2007 18:16:11;0001;   pbs_mom;Job;409.oscar_server.oscardomain;server
rejected job obit - 15008
09/20/2007 18:18:45;0001;   pbs_mom;Job;TMomFinalizeJob3;job not started,
Failure job exec failure, after files staged, no retry
09/20/2007 18:18:45;0008;   pbs_mom;Req;send_sisters;sending ABORT to
sisters
09/20/2007 18:18:45;0001;   pbs_mom;Job;410.oscar_server.oscardomain;server
rejected job obit - 15008
------------

there is also warning in sys_log and pbs_server log
------
09/20/2007 17:52:41;0040;PBS_Server;Req;ping_nodes;successful ping to node
oscarnode1.oscardomain

09/20/2007 17:52:41;0001;PBS_Server;Svr;PBS_Server;Connection refused (111)
in contact_sched, Could not contact Scheduler - port 15004
09/20/2007 17:53:41;0040;PBS_Server;Svr;oscar_server.oscardomain;Scheduler
sent command scheduler_first
09/20/2007 17:54:41;0040;PBS_Server;Svr;oscar_server.oscardomain;Scheduler
sent command time
09/20/2007 17:55:41;0040;PBS_Server;Svr;oscar_server.oscardomain;Scheduler
sent command time
-------

so I configured maui.cfg, and set serverport to 15004 and restart; but the
job still can't run successfully.

Can any one give me any clue?
Thanks very much!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070920/298571f8/attachment-0001.html


More information about the torqueusers mailing list