[torqueusers] Torque OS X users

Glen Beane beaneg at umcs.maine.edu
Mon Sep 13 11:36:28 MDT 2004


I have some questions for OS X people out there.

I'm running the August 30th snapshot of Torque on my OS X cluster at the
moment.  While I can start jobs, I'm seeing a few glitches

1.  I can't qdel jobs started with mpiexec.  If I do, I have to go in
and kill the mpiexec process by hand, or else I get the following error
over and over and over from pbs_mom:

do_tcp: got an internal task manager request
tm_request: job 89.bender.bender.clusters.umaine.edu cookie
697071C92400DF443BABD1FBAC35363C task 1 com 100 event 41
pbs_mom: Unknown error: 0 (0) in tm_request, job
89.bender.bender.clusters.umaine.edu not found 

2. pbs_server / pbs_sched communication issues.  Occasionally (actually
fairly often) pbs_server can't communicate with the scheduler so it
can't let the scheduler know when a new job is queued, or a running job
finishes.  Also occasionally pbs commands are slow responding (it make
take 5-10 seconds for a qsub or qstat to respond, but other times it is
instant.) I think I have seen the slow response to commands on a Linux
system as well.

3. CPU time is not reported correctly (it stays at 00:00:00), relaited
to this pbs_mom error:

mom_get_sample: entered
cput_sum: p_stat 0x2
pbs_mom: Bad address (14) in cput_sum, kvm_read(pstats) (806000, 2,
bffff950, 228)
cput_sum: p_stat 0x2
pbs_mom: Bad address (14) in cput_sum, kvm_read(pstats) (806000, 2,
bffff950, 228)
cput_sum: p_stat 0x2
pbs_mom: Bad address (14) in cput_sum, kvm_read(pstats) (806000, 2,
bffff950, 228)

4. pbs_mom does not report memory correctly(this is from a compute node
with 2GB RAM):

totmem=? 15201,availmem=? 15201,physmem=4292870144kb


Any other OS X torque users see the same issues?

Glen



More information about the torqueusers mailing list