[torqueusers] torque-2.5.3-1cri.x86_64 hang when a node falls

David Beer dbeer at adaptivecomputing.com
Fri Mar 4 11:15:30 MST 2011



----- Original Message -----
> Hi David,
> 
> first of all, sorry for sending the mail to you directly. I was in
> "panic" (friday afternoon, pbs_sevrer update :-) ).
> 
> Second, I solved the issue, it was easy, client part of server host
> was
> pointing to our backup's server, so it was refering to another's
> machine node file :-) . After changing pbs_sevrer content, I was able
> to
> recover my nodes.
> 

I'm glad you were able to solve this issue. As far as sending it to me directly, if you send it to me and the mailing list or to the mailing list only I only see one message either way, and it will arrive at the same time both ways, so I didn't actually notice that you sent it to me directly as well as the mailing list.

> the only thing I had problems with has been jobs. I've lsot all of
> them
> cause the first pbs_sevrer start of 2.5.5 has destroyed my jobs dir.
> 
> I've tried to recopy all job files to new job dir but then the server
> complained about them. I'm not sure if this is a bug or only that I
> did
> something worng.
> 
> Anyway, many thanks for your reply and your help.

Did your jobs directory get destroyed or did you relocate it? (I'm just wondering how you had the job files if the directory was destroyed) What is the error pbs_server gives you when you copy the files? I would expect this to work as you mentioned 2.5.3 and now you're on 2.5.5, and there is no difference in job files between these versions.

-- 
David Beer 
Direct Line: 801-717-3386 | Fax: 801-717-3738
     Adaptive Computing
     1656 S. East Bay Blvd. Suite #300
     Provo, UT 84606



More information about the torqueusers mailing list