[gold-users] Integration with LSF, HPC MPI job

Wei Lin weilin at platform.com
Wed Mar 16 15:12:21 MDT 2011


Hi, Scott
  
    MPI job can run on multiple hosts, did customer like to "greserve"
per host or just "greserve" once as a whole job ? 
    See example: 
    Thanks
Wei Lin

--------------------------------------------------------
EXAMPLE:
(1) submit a mpi job: 
[weilin at amd64dcore conf]$ bsub -P lsf_p1 -q normal -W 20 -n2 -m
"amd64dcore! pprh3" -a lammpi -R"span[ptile=1]" -L /bin/tcsh mpirun.lsf 
/home/weilin/shell/cpi_mpi  
Quote command: /opt/gold/bin/gquote -u weilin -p "lsf_p1" -m amd64dcore
-P 2 -t 1200 --verbose --quiet
-----------------------------------------------------------
Quote response: 2400
-----------------------------------------------------------
Balance response:   99727
Balance available response:   99727
Job <2414> is submitted to queue <normal>.

(2) job run at 2 hosts , 1 slot per host

[weilin at amd64dcore conf]$ bjobs 2414
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME
SUBMIT_TIME
2414    weilin  RUN   normal     amd64dcore  amd64dcore  *l/cpi_mpi Mar
15 12:42
 
pprh3.asia.corp.platform.com
(3) the log of eexec:   reserve twice
[weilin at amd64dcore conf]$ vi ../log/eexec.log
    796 LSB_MCPU_HOSTS: amd64dcore 1 pprh3.asia.corp.platform.com 1
    797 Reserve command: /opt/gold/bin/greserve -J 2414 -p lsf_p1 -u
weilin -m amd64dcore -P 1 -t 1200 --verbose --quiet
    798 -----------------------------------------------------------
    799 Reserve response: 83 671200     < =========================first
reservation
    800 -----------------------------------------------------------
    801 Reserve command: /opt/gold/bin/greserve -J 2414 -p lsf_p1 -u
weilin -m pprh3.asia.corp.platform.com -P 1 -t 1200 --verbose --quiet
    802 -----------------------------------------------------------
    803 Reserve response: 84 681200  < =========================second
reservation
    804 -----------------------------------------------------------
    805 local_machine = amd64dcore

(4) display the reservation on Gold, two items: 
[weilin at amd64dcore conf]$ glsres
Id Name Amount StartTime           EndTime             Job User
Project Machine                      Accounts Description
-- ---- ------ ------------------- ------------------- --- ------
------- ---------------------------- -------- -----------
67 2414   1200 2011-03-15 12:42:32 2011-03-15 13:12:32 83  weilin lsf_p1
amd64dcore                   4
68 2414   1200 2011-03-15 12:42:32 2011-03-15 13:12:32 84  weilin lsf_p1
pprh3.asia.corp.platform.com 4
[weilin at amd64dcore conf]$



More information about the gold-users mailing list