[torqueusers] routing queues to remote cluster
Valery Mitsyn
vvm at mammoth.jinr.ru
Sat Apr 2 13:24:56 MST 2005
Hi *,
(I've learning recently that all peoply @ supercluster.org
are besy, so, I'ld like to post this bug to the list too,
after bugzilla #62)
Attempt to send a job to remote server via local queue:
... queue_type = Route;
... route_destinations = <remote_queue>@<remote_server>;
rejected immediately on the local server with "Job rejected by all possible
destinations".
A real problem has been introdused in src/server/svr_connect.c, when
check for "host in the proper state" in the "svr_connect" was added (via
call to "addr_ok"). In the current implementation this check will return
"success" _only_ if address of a destination server is a MOM node and a
node is in "state = free" due a nature of the "addr_ok", which is always
will fail for remote server.
From my point of view, the logic of "addr_ok" cat be changed in such a
way:
return "fail==0" if and only if "addr" is a MOM node and node is in a bad
state;
else return "success==1'.
I've attached a patch for torque 1.2.0p2.
Best regards,
Valery Mitsyn
-------------- next part --------------
--- torque-1.2.0p2/src/server/node_func.c.orig 2005-02-22 23:59:23.000000000 +0300
+++ torque-1.2.0p2/src/server/node_func.c 2005-04-01 00:57:35.000000000 +0400
@@ -249,7 +249,8 @@
/*
- * returns 1 if node is OK, 0 if node is down.
+ * return 0 if addr is a node and node is in bad state,
+ * return 1 else (it is not a MOM node, or it's atate is OK)
*/
int addr_ok(
@@ -257,18 +258,19 @@
pbs_net_t addr)
{
- int i, status = 0;
+ int i, status = 1;
if (pbsndlist)
{
for (i=0; i<svr_totnodes; i++)
{
- if (!(pbsndlist[i]->nd_state &
- (INUSE_DOWN|INUSE_DELETED|INUSE_UNKNOWN)) &&
- pbsndlist[i]->nd_addrs[0] == addr)
+ if (pbsndlist[i]->nd_addrs[0] == addr)
{
- status = 1;
-
+ if (pbsndlist[i]->nd_state &
+ (INUSE_DOWN|INUSE_DELETED|INUSE_UNKNOWN))
+ {
+ status = 0;
+ }
break;
}
}
More information about the torqueusers
mailing list