[torqueusers] pbs_nodefile: undefined variable
sm4082 at nyu.edu
Wed Nov 23 10:12:18 MST 2011
Yesterday I updated the file nodes in server_priv directory manually. We have login nodes at the end of this file. I wanted to add new nodes. I used qmgr command create to add the new nodes. Pbstop started showing the new nodes after the login nodes. So I used the delete command to take them off the list, and then added new nodes with create command, login nodes. Still they were added in the same order before. So I manually edited the file to put the login nodes at the end of the file, i.e., right after the new nodes. Do you think it could break something? Or it is caused by something else?
Since then some users are getting the error "PBS_NODEFILE: undefined variable". I am thinking something definitely broke because of what I did. Does anyone have any idea how to fix this? Can I install the same version again on just master node and login nodes with out having to install it on all the compute nodes? All mpi jobs seem to be affected.
Strange thing is it works ok for most of the users. Yesterday, I disabled the qsub wrapper and it was ok. Now I realize it has nothing to do with qsub wrapper since the same user has got the same error this morning again. I
I would really appreciate any help.
HPC Support Specialist
New York University
251 Mercer Street
New York, NY 10012-1110
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers