[torqueusers] Same job on several nodes
"Mgr. Šimon Tóth"
SimonT at mail.muni.cz
Thu Feb 4 11:24:22 MST 2010
>>> I am building a heterogeneous cluster so as to compare performance of
>>> the same program on various hardware architectures. For this purpose, I
>>> was advised to use torque.
>>>
>>> Thus, I am looking forward to execute the very same job on all nodes of
>>> my cluster. So far, I've considered '-t 1-n' and '-l nodes=n' qsub
>>> options but none appears to fit my need.
>>>
>>> Indeed, on the one hand, '-l nodes=n' reserves n nodes but won't spread
>>> the sequential job, and, on the other hand, '-t 1-n' will spawn n jobs
>>> but won't necessarily attach them to n different nodes. So, what I want
>>> is some kind of mix of both options : n jobs run on n different nodes.
>>>
>>> Do you know of a means to do this ? Of course, I could iterate over the
>>> nodes hostnames and attach that many jobs to each node... But I wouldn't
>>> come to this end if there is a more straightforward way.
>>
>> I suppose the best way to do this is to add corresponding properties to
>> nodes (describing the architecture) and simply generate the necessary
>> amount of jobs with -l nodes=1:property.
>
> I'm not sure to get it right. By "generate the necessary amount of
> jobs", do you mean doing so by that many individual calls to qsub ? And
> using "property" to attach each to the desired node, I guess.
> Am I correct ?
>
> As a first try, I did :
> for n in `cat nodes`; do (echo hostname | qsub -l nodes=$n); done;
> Is it more or less what you are talking about ?
Well, I would do:
for i in `cat architectures`;
do
echo hostname | qsub -l nodes=1:$i;
done;
If you just have one computer for each architecture, then there really
isn't much point in using a batch system. What I meant was to tag the
nodes (set the property attribute) with the features they provide (OS,
Architecture, etc...) and then submit jobs requesting these features.
--
Mgr. Šimon Tóth
More information about the torqueusers
mailing list