[torqueusers] unknow reason:the pbs_server daemon was killed and can not start

Garrick Staples garrick at usc.edu
Wed Nov 12 22:26:56 MST 2008


On Thu, Nov 13, 2008 at 12:50:09PM +0800, Weiguang Chen alleged:
> Hi,
> Our torque version is 2.3.13.
> Today, "qstat" command can not be executed normally, and I found:
> [root at node1 init.d]# qstat
> Cannot connect to default server host 'node1' - check pbs_server daemon.
> qstat: cannot connect to server node1 (errno=111)
> 
> and I checked the pbs_server daemon and found
> -- [root at node1 init.d]# ps -ef|grep pbs
> root      3079     1  0 Sep16 ?        00:01:07 /usr/local/sbin/pbs_sched
> root     16571  5229  0 12:38 pts/21   00:00:00 grep pbs
> 
> The pbs_server daemon was killed by unknow reason
> and when i decided to rerun this daemon, a problem happened:
> [root at node1 init.d]# /usr/local/sbin/pbs_server
> pbs_server: svr_func.c:222: set_resc_assigned: Assertion
> `pjob->ji_qhdr->qu_qs.qu_type == 1' failed.
> 已放弃
> What is the problem?
> How i can do?

I've never seen this happen before, but looking at the code tells pretty
clearly what is happening.  A routing queue has a resources_assigned property;
which is illegal.

Unless someone has a better idea, you will need to delete the queue definition
for the corrupt queue.  cd into $PBS_SERVER_HOME/server_priv/queues/.  These
are binary files, but you can grep them for the strings "Route" and
"resources_assigned".

'grep -l resources_assigned * | xargs grep -l Route' shouldn't print anything.
If it does, move that file to another directory.  Once pbs_server starts, you
can run 'strings' on that bad queue file and recreate it.  Save that file, gzip
it, and send it to me or the list.  I'd like to see it.

Or if you have a backup copy of your server config, then create a new serverdb
and recreate your config.  Don't forget to set your next job number after
recreating the serverdb.


-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Revoke LDS Church 501(c)(3) Status - http://lds501c3.wordpress.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20081112/839dcb43/attachment.bin


More information about the torqueusers mailing list