[torqueusers] PBS job issue

Steve Young chemadm at hamilton.edu
Mon Jan 19 03:11:10 MST 2009


Hi,
	I guess if it were me I'd get whatever version of MPI that my  
software recommends and then follow the instructions for that version  
of MPI to compile it. If you don't have root access to install it  
globally then you might have to build it and run it from your home  
directory.  However, this is off the top of my head there might be  
other problems without having root access in order to use it. Hope  
this helps,

-Steve

On Jan 16, 2009, at 3:33 PM, Abhishek Gupta wrote:

> Steve,
> Do you have the links that explain the exact configuration options  
> for compiling it in a right way? The person who did it not here and  
> so I cannot contact him. Might be I have to try setting up the  
> desired configuration on some other computer and test it. I tried  
> finding it on the internet but couldn't find the proper link for all  
> the details i require to set it up properly.
> Thanks,
> Abhishek.
>
> Steve Young wrote:
>> Hi,
>>    No I mean building/compilng MPI so that you get those  
>> executables, mpiexec, mpicc, etc. Are you using openmpi? mpich?  
>> Making it TM aware is a configure option when compiling the version  
>> of MPI you have. When you compile mpi and make it TM aware it  
>> builds mpiexec so that it knows how to get the node information  
>> from torque and also allows torque to use the pbs_mom's to start  
>> and stop the mpd process. Otherwise, you'd need to worry about  
>> having to start mpd on each host you plan on running on. Someone  
>> must of compiled MPI on your system in order for you to have gotten  
>> mpiexec, mpicc, and so on. If you didn't build it then you'll need  
>> to find the person who did and ask them how it was compiled. Hope  
>> this helps =).
>>
>> -Steve
>>
>>
>>
>> On Jan 16, 2009, at 1:38 PM, Abhishek Gupta wrote:
>>
>>> Steve,
>>> Could you tell me how to compile MPI to make it TM aware? If  
>>> compiling you mean to say using mpicc to compile C/C++ programs  
>>> and mpif70 for fortran, and then use mpirun command in the PBS  
>>> script to submit it as job, then I did this but my first job still  
>>> stuck there with following running with no problem.
>>> Am I missing something here?
>>> Thanks,
>>> Abhishek.
>>>
>>> Steve Young wrote:
>>>> Hi,
>>>>   Is MPI compiled to be TM aware? Meaning if it is it would be  
>>>> able to use the pbs_mom's to start and stop the mpd daemon's.  
>>>> When you check the nodes which were assigned what do you mean  
>>>> actually? Assigned by PBS or assigned by MPI? If MPI isn't  
>>>> compiled to be TM aware then torque will assign nodes to the job  
>>>> but MPI won't use them and will assign it's own list of nodes to  
>>>> the job. So like I mentioned before even though torque tells you  
>>>> that it assigned the job to certain nodes it might in fact be  
>>>> running on different nodes that MPI assigned. What you need to do  
>>>> now is make sure your version of MPI is compiled to be TM -aware.  
>>>> Search the archives of this list and you'll find it's a common  
>>>> problem people encounter.
>>>>
>>>> -Steve
>>>>
>>>>
>>>> On Jan 16, 2009, at 11:37 AM, Abhishek Gupta wrote:
>>>>
>>>>> Hi Steve,
>>>>> You are right, it is MPI type of job. I checked the nodes which  
>>>>> were assigned to the job and there was no job running. Even the  
>>>>> job that should run in a few seconds, was totally stuck. Could  
>>>>> you please tell me what should I do to solve this problem?
>>>>> Thanks,
>>>>> Abhishek.
>>>>>
>>>>> Steve Young wrote:
>>>>>> Hi,
>>>>>>  I'm wondering if this is an MPI type of job? Did you make sure  
>>>>>> to compile MPI to be TM-aware? How do you know the job is not  
>>>>>> actually running somewhere? I've found that if you don't make  
>>>>>> MPI aware of torque then the jobs end up on nodes MPI assigns  
>>>>>> and doesn't run on the nodes torque assigns. I ended up using  
>>>>>> OSC's version of mpiexec but using a version of MPI that can be  
>>>>>> compiled to be TM aware would do the same thing. This is just a  
>>>>>> guess without knowing what kind of job your running, what  
>>>>>> version of torque you have, how you have things configured and  
>>>>>> such. Hope this helps,
>>>>>>
>>>>>> -Steve
>>>>>>
>>>>>>
>>>>>> On Jan 16, 2009, at 11:13 AM, Abhishek Gupta wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> I am facing a problem with job submission in which my first  
>>>>>>> job gets stuck for ever( showing R state ) and if I run the  
>>>>>>> same job keeping the first job, second job runs without any  
>>>>>>> problem. I found that when I ask for more than 1 node, then  
>>>>>>> only this problem arises. Even if I say nodes=1:ppn=2, it runs  
>>>>>>> without any problem, but nodes=2 do not work for the first  
>>>>>>> time. There is one thing that I found, even some other  
>>>>>>> job( which require more than one node is stuck started by some  
>>>>>>> other user), my job with requirement more than one node run  
>>>>>>> smoothly while the job of that other user stays in that state  
>>>>>>> forever.
>>>>>>> Could someone tell what could be the issue? Is there any  
>>>>>>> parameter that need to be set?
>>>>>>> Thanks,
>>>>>>> Abhishek.
>>>>>>> _______________________________________________
>>>>>>> torqueusers mailing list
>>>>>>> torqueusers at supercluster.org
>>>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>>>
>>>>
>>



More information about the torqueusers mailing list