[torqueusers] Help! One Puzzle At a Time...

sam oubari soubari at yahoo.com
Tue Sep 6 08:07:45 MDT 2011


Hello,
 
I am no expert at TORQUE and one key puzzle for us is why, on occasions, a waiting job moves from H to Q but not R when it's scheduled time comes?  When I attempt to force it with qrun I get:
 
qrun: Resource temporarily unavailable MSG=job allocation request exceeds currently available cluster nodes, 1 requested, 0 available 3030.naboo.linnbenton.edu

Below is the output of 'printserverdb' and 'qnodes' during the "freeze".  To fix, I had to kill mom, restart it, then qrun the first Q job.
 
Any hints would be greatly appreciated.  Thx! Sam.
 
PS. I've provided more details on 8/28/11.
 
 
------
Sam Oubari, Manager of Systems & Application Programming
Linn-Benton Community College -- Information Services
6500 Pacific Blvd SW, Room# CC 110E -- Albany OR 97321
Tel: 541-917-4355/Fax: 541-917-4379
 
======
 
# printserverdb
---------------------------------------------------
numjobs:                26
numque:         5
jobidnumber:            3575
savetm:         1314100391
--attributes--
scheduling = True
max_running = 23
total_jobs = 22
state_count = Transit:0 Queued:0 Held:2 Waiting:17 Running:0 Exiting:0
default_queue = sys_tst
log_events = 511
mail_from = adm
query_other_jobs = False
resources_assigned.nodect = 0
scheduler_iteration = 600
node_check_rate = 150
tcp_timeout = 6
mom_job_sync = False
pbs_version = 2.5.6
keep_completed = 600
allow_node_submit = True
next_job_number = 1
net_counter = 7 1 0
 
# qnodes
naboo
     state = down
     np = 40
     ntype = cluster
     status = rectime=1315288785,varattr=,jobs=3448.naboo.linnbenton.edu 3449.naboo.linnbenton.edu 3450.
naboo.linnbenton.edu,state=free,netload=1345146873471,gres=,loadave=0.08,ncpus=4,physmem=17040092kb,avai
lmem=23485296kb,totmem=29739432kb,idletime=459327,nusers=5,nsessions=115,sessions=361 363 365 367 369 37
1 373 375 377 379 381 383 385 387 389 391 393 395 397 399 401 407 409 413 422 424 426 428 430 432 434 43
6 438 440 442 444 446 448 450 452 454 456 460 462 466 471 474 476 479 481 483 485 487 489 491 493 495 49
7 499 501 503 505 507 518 520 522 527 529 531 533 535 537 539 546 548 550 552 554 556 558 560 562 564 56
7 578 585 587 589 660 662 956 960 1637 1648 1657 1863 1891 5763 5839 5875 13067 18926 18986 19028 24492
24541 24588 24639 24684 24740 24787 29226 29631 30517 30521,uname=Linux naboo.linnbenton.edu 2.6.18-238.
12.1.0.1.el5 #1 SMP Tue May 31 14:51:07 EDT 2011 x86_64,opsys=linux
     gpus = 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20110906/6ba38377/attachment-0001.html 


More information about the torqueusers mailing list