[torqueusers] Torque OS X users
Glen Beane
beaneg at umcs.maine.edu
Mon Sep 13 11:36:28 MDT 2004
I have some questions for OS X people out there.
I'm running the August 30th snapshot of Torque on my OS X cluster at the
moment. While I can start jobs, I'm seeing a few glitches
1. I can't qdel jobs started with mpiexec. If I do, I have to go in
and kill the mpiexec process by hand, or else I get the following error
over and over and over from pbs_mom:
do_tcp: got an internal task manager request
tm_request: job 89.bender.bender.clusters.umaine.edu cookie
697071C92400DF443BABD1FBAC35363C task 1 com 100 event 41
pbs_mom: Unknown error: 0 (0) in tm_request, job
89.bender.bender.clusters.umaine.edu not found
2. pbs_server / pbs_sched communication issues. Occasionally (actually
fairly often) pbs_server can't communicate with the scheduler so it
can't let the scheduler know when a new job is queued, or a running job
finishes. Also occasionally pbs commands are slow responding (it make
take 5-10 seconds for a qsub or qstat to respond, but other times it is
instant.) I think I have seen the slow response to commands on a Linux
system as well.
3. CPU time is not reported correctly (it stays at 00:00:00), relaited
to this pbs_mom error:
mom_get_sample: entered
cput_sum: p_stat 0x2
pbs_mom: Bad address (14) in cput_sum, kvm_read(pstats) (806000, 2,
bffff950, 228)
cput_sum: p_stat 0x2
pbs_mom: Bad address (14) in cput_sum, kvm_read(pstats) (806000, 2,
bffff950, 228)
cput_sum: p_stat 0x2
pbs_mom: Bad address (14) in cput_sum, kvm_read(pstats) (806000, 2,
bffff950, 228)
4. pbs_mom does not report memory correctly(this is from a compute node
with 2GB RAM):
totmem=? 15201,availmem=? 15201,physmem=4292870144kb
Any other OS X torque users see the same issues?
Glen
More information about the torqueusers
mailing list