[torqueusers] #PBS -V in version 2.5.10
Sreedhar Manchu
sm4082 at nyu.edu
Mon Mar 19 17:37:14 MDT 2012
We are also having this problem. Serious problem with this version is some pbs variables are not being defined (PBS_JOBNAME PBS_JOBID). This is the reason you don't see err and out files ( I am assuming user has these variables in pbs -e and -o directives). If you have compiled torque with --enable-syslog you can see in the logs on compute nodes that it can't create them since variables are undefined.
I asked users to mention absolute path. For parallel jobs and array jobs I am sourcing a script file through wrapper. This script file defines pbs_nodefile that is needed for parallel jobs and array id for array jobs.
Strangely, if I restart pbs_mom it works ok for the user who had failed jobs before. But after a while it happens all again for different user. I checked 2.5.11 and there are not that many differences between this and 2.5.10. Not sure upgrading to 11 would solve this problem.
Sreedhar.
--
Sent from my phone. Please excuse my brevity and any typos.
On Mar 19, 2012, at 18:42, Joseph Farran <jfarran at uci.edu> wrote:
> Hi Ken.
>
> Yes. One of our users has job arrays which is the person experiencing this problem. I deleted all jobs prior to upgrading.
>
> Is there something I forgot go clean out that needs cleaning?
>
> Joseph
>
>
> On 03/19/2012 03:32 PM, Ken Nielson wrote:
>> On Mon, Mar 19, 2012 at 4:21 PM, Joseph Farran <jfarran at uci.edu <mailto:jfarran at uci.edu>> wrote:
>>
>> Hello.
>>
>> We were using Torque 2.5.9 and we were able to use the Torque PBS directive "#PBS -V" just fine.
>>
>> On upgrading to Torque 2.5.10, the same scripts which used to work using "#PBS -V" no longer work.
>>
>> When we submit a job using "#PBS -V", the job starts and nothing happens - no output, no errors, nothing. The job starts but nothing happens.
>>
>> Looking at Torque logs /opt/torque/server_logs shows no errors - just the job starting and ending.
>>
>> If we remove ""#PBS -V" then the job runs just fine.
>>
>> Anyone else ran into this or knows what is going on?
>>
>> Thanks,
>> Joseph
>>
>> Did you have any array jobs in your queue when you upgraded?
>>
>> Ken
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list