[torqueusers] Same job on several nodes

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Thu Feb 4 11:24:22 MST 2010


>>> I am building a heterogeneous cluster so as to compare performance of
>>> the same program on various hardware architectures. For this purpose, I
>>> was advised to use torque.
>>>
>>> Thus, I am looking forward to execute the very same job on all nodes of
>>> my cluster. So far, I've considered '-t 1-n' and '-l nodes=n' qsub
>>> options but none appears to fit my need. 
>>>
>>> Indeed, on the one hand, '-l nodes=n' reserves n nodes but won't spread
>>> the sequential job, and, on the other hand, '-t 1-n' will spawn n jobs
>>> but won't necessarily attach them to n different nodes. So, what I want
>>> is some kind of mix of both options : n jobs run on n different nodes.
>>>
>>> Do you know of a means to do this ? Of course, I could iterate over the
>>> nodes hostnames and attach that many jobs to each node... But I wouldn't
>>> come to this end if there is a more straightforward way.
>>
>> I suppose the best way to do this is to add corresponding properties to
>> nodes (describing the architecture) and simply generate the necessary
>> amount of jobs with -l nodes=1:property.
> 
> I'm not sure to get it right. By "generate the necessary amount of
> jobs", do you mean doing so by that many individual calls to qsub ? And
> using "property" to attach each to the desired node, I guess. 
> Am I correct ?
> 
> As a first try, I did :
> for n in `cat nodes`; do (echo hostname | qsub -l nodes=$n); done;
> Is it more or less what you are talking about ?

Well, I would do:
for i in `cat architectures`;
do
  echo hostname | qsub -l nodes=1:$i;
done;

If you just have one computer for each architecture, then there really
isn't much point in using a batch system. What I meant was to tag the
nodes (set the property attribute) with the features they provide (OS,
Architecture, etc...) and then submit jobs requesting these features.

-- 
Mgr. Šimon Tóth


More information about the torqueusers mailing list