[torqueusers] unknow reason:the pbs_server daemon was killed and can not start

Garrick Staples garrick at usc.edu
Thu Nov 13 00:39:30 MST 2008


On Thu, Nov 13, 2008 at 12:50:09PM +0800, Weiguang Chen alleged:
> Hi,
> Our torque version is 2.3.13.
> Today, "qstat" command can not be executed normally, and I found:
> [root at node1 init.d]# qstat
> Cannot connect to default server host 'node1' - check pbs_server daemon.
> qstat: cannot connect to server node1 (errno=111)
> 
> and I checked the pbs_server daemon and found
> -- [root at node1 init.d]# ps -ef|grep pbs
> root      3079     1  0 Sep16 ?        00:01:07 /usr/local/sbin/pbs_sched
> root     16571  5229  0 12:38 pts/21   00:00:00 grep pbs
> 
> The pbs_server daemon was killed by unknow reason
> and when i decided to rerun this daemon, a problem happened:
> [root at node1 init.d]# /usr/local/sbin/pbs_server
> pbs_server: svr_func.c:222: set_resc_assigned: Assertion
> `pjob->ji_qhdr->qu_qs.qu_type == 1' failed.
> 已放弃
> What is the problem?
> How i can do?

You might also just backup your entire $PBS_SERVER_HOME/server_priv directory,
and rebuild torque with that src/server/svr_func.c:assert() call commented out.  

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Revoke LDS Church 501(c)(3) Status - http://lds501c3.wordpress.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081112/0b6ab7b6/attachment.bin


More information about the torqueusers mailing list