[torquedev] mom_priv/config line length limit,
OS X struct differences
Neil Hodgson
neil.hodgson at sirca.org.au
Thu Feb 5 17:01:28 MST 2009
I should introduce myself first. I am working for SIRCA, a financial
services research organisation that uses TORQUE 1.2.0p5 (with local
patches) to run financial data queries on a set of nodes. A locally
developed scheduler written in TCL uses job type to schedule each job
onto a node that wants to receive that job type. Further, each node
specifies a relative priority for job types. The server and scheduler
run on a pair of machines using Red Hat clustering with a floating IP
address and name used to communicate with the primary. The local patches
to the scheduler implement a -h option for selecting the hostname
similar to the -H option to pbs_server. I worked for SIRCA for 6 months
last year, mostly on other areas but also made the TCL scheduler more
robust. I am back for around a month to upgrade TORQUE to 2.3.6 and to
rewrite the scheduler in C++ so it can be maintained by any of the
developers here.
The first minor issue is that the code that reads mom_priv/config
uses fgets with a 120 character buffer. This has led to problems here
when a property for a node gets large. It would be better for SIRCA if
this buffer was larger - perhaps 250 to 1000 characters.
On OS X, sockaddr_in is different to Linux, notably in starting with
a sin_len field.
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8]; /* XXX bwg2001-004 */
};
In TORQUE, most common code fills in the sin_family, sin_port, and
sin_addr fields which leaves the sin_len and sin_zero fields
uninitialized. This appears to be safe in standard TORQUE where
INADDR_ANY is often used but can cause failures when sin_addr is set to
something else. The failures stopped when the structs were fully
initialized to zero with memset(&a, 0, sizeof(a)). I think it would be
good defensive programming to always fully initialize these structures.
While initializing by assigning one field to zero like
struct sockaddr_in a={0};
would be prettier in my opinion, recent GCC produces a warning for this
idiom.
The manual page for pbs_server still shows the hostname parameter as
-h rather than -H.
Neil
More information about the torquedev
mailing list