[gold-users] Integration with LSF, HPC MPI job
Wei Lin
weilin at platform.com
Wed Mar 16 15:12:21 MDT 2011
Hi, Scott
MPI job can run on multiple hosts, did customer like to "greserve"
per host or just "greserve" once as a whole job ?
See example:
Thanks
Wei Lin
--------------------------------------------------------
EXAMPLE:
(1) submit a mpi job:
[weilin at amd64dcore conf]$ bsub -P lsf_p1 -q normal -W 20 -n2 -m
"amd64dcore! pprh3" -a lammpi -R"span[ptile=1]" -L /bin/tcsh mpirun.lsf
/home/weilin/shell/cpi_mpi
Quote command: /opt/gold/bin/gquote -u weilin -p "lsf_p1" -m amd64dcore
-P 2 -t 1200 --verbose --quiet
-----------------------------------------------------------
Quote response: 2400
-----------------------------------------------------------
Balance response: 99727
Balance available response: 99727
Job <2414> is submitted to queue <normal>.
(2) job run at 2 hosts , 1 slot per host
[weilin at amd64dcore conf]$ bjobs 2414
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME
SUBMIT_TIME
2414 weilin RUN normal amd64dcore amd64dcore *l/cpi_mpi Mar
15 12:42
pprh3.asia.corp.platform.com
(3) the log of eexec: reserve twice
[weilin at amd64dcore conf]$ vi ../log/eexec.log
796 LSB_MCPU_HOSTS: amd64dcore 1 pprh3.asia.corp.platform.com 1
797 Reserve command: /opt/gold/bin/greserve -J 2414 -p lsf_p1 -u
weilin -m amd64dcore -P 1 -t 1200 --verbose --quiet
798 -----------------------------------------------------------
799 Reserve response: 83 671200 < =========================first
reservation
800 -----------------------------------------------------------
801 Reserve command: /opt/gold/bin/greserve -J 2414 -p lsf_p1 -u
weilin -m pprh3.asia.corp.platform.com -P 1 -t 1200 --verbose --quiet
802 -----------------------------------------------------------
803 Reserve response: 84 681200 < =========================second
reservation
804 -----------------------------------------------------------
805 local_machine = amd64dcore
(4) display the reservation on Gold, two items:
[weilin at amd64dcore conf]$ glsres
Id Name Amount StartTime EndTime Job User
Project Machine Accounts Description
-- ---- ------ ------------------- ------------------- --- ------
------- ---------------------------- -------- -----------
67 2414 1200 2011-03-15 12:42:32 2011-03-15 13:12:32 83 weilin lsf_p1
amd64dcore 4
68 2414 1200 2011-03-15 12:42:32 2011-03-15 13:12:32 84 weilin lsf_p1
pprh3.asia.corp.platform.com 4
[weilin at amd64dcore conf]$
More information about the gold-users
mailing list