[torqueusers] #PBS -V in version 2.5.10

Sreedhar Manchu sm4082 at nyu.edu
Thu Mar 22 18:08:07 MDT 2012


Thanks, Joseph. But we are using rocks 5.1. In fact, we don't have any problem with -V flag. It is just that sometimes pbs variables are not being defined by torque. It is very random the way it's been happening. PBS_NODEFILE, PBS_JOBID and PBS_JOBNAME and few more variables are missing. I am looking into code to try to understand what is happening. 

Thanks
Sreedhar. 

--
Sent from my phone. Please excuse my brevity and any typos.

On Mar 22, 2012, at 18:06, Joseph Farran <jfarran at uci.edu> wrote:

> Sreedhar, are you using Rocks 5.4.3 by any change?
> 
> The "#PBS -V" was  *NOT* an issue with Torque after all but rather a Rocks BUG.
> 
> We are using Rocks 5.4.3 and after applying this fix:
> 
> http://groups.google.com/group/rocks-clusters/browse_thread/thread/d56541d7755438c7/8813bf8ed30a66d1?fwc=2&pli=1
> 
> The "#PBS -V" works just fine and as expected.
> 
> Hope this helps,
> Joseph
> 
> 
> On 03/19/2012 04:37 PM, Sreedhar Manchu wrote:
>> We are also having this problem. Serious problem with this version is some pbs variables are not being defined (PBS_JOBNAME PBS_JOBID). This is the reason you don't see err and out files ( I am assuming user has these variables in pbs -e and -o directives). If you have compiled torque with --enable-syslog you can see in the logs on compute nodes that it can't create them since variables are undefined.
>> 
>> I asked users to mention absolute path. For parallel jobs and array jobs I am sourcing a script file through wrapper. This script file defines pbs_nodefile that is needed for parallel jobs and array id for array jobs.
>> 
>> Strangely, if I restart pbs_mom it works ok for the user who had failed jobs before. But after a while it happens all again for different user. I checked 2.5.11 and there are not that many differences between this and 2.5.10. Not sure upgrading to 11 would solve this problem.
>> 
>> Sreedhar.
>> 
>> --
>> Sent from my phone. Please excuse my brevity and any typos.
>> 
>> On Mar 19, 2012, at 18:42, Joseph Farran<jfarran at uci.edu>  wrote:
>> 
>>> Hi Ken.
>>> 
>>> Yes.   One of our users has job arrays which is the person experiencing this problem.    I deleted all jobs prior to upgrading.
>>> 
>>> Is there something I forgot go clean out that needs cleaning?
>>> 
>>> Joseph
>>> 
>>> 
>>> On 03/19/2012 03:32 PM, Ken Nielson wrote:
>>>> On Mon, Mar 19, 2012 at 4:21 PM, Joseph Farran<jfarran at uci.edu<mailto:jfarran at uci.edu>>  wrote:
>>>> 
>>>>    Hello.
>>>> 
>>>>    We were using Torque 2.5.9 and we were able to use the Torque PBS directive "#PBS -V" just fine.
>>>> 
>>>>    On upgrading to Torque 2.5.10, the same scripts which used to work using "#PBS -V" no longer work.
>>>> 
>>>>    When we submit a job using "#PBS -V", the job starts and nothing happens - no output, no errors, nothing.  The job starts but nothing happens.
>>>> 
>>>>    Looking at Torque logs /opt/torque/server_logs shows no errors - just the job starting and ending.
>>>> 
>>>>    If we remove ""#PBS -V" then the job runs just fine.
>>>> 
>>>>    Anyone else ran into this or knows what is going on?
>>>> 
>>>>    Thanks,
>>>>    Joseph
>>>> 
>>>> Did you have any array jobs in your queue when you upgraded?
>>>> 
>>>> Ken
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 


More information about the torqueusers mailing list