[torqueusers] can't execute multi-processors run

lorenzo118 at interfree.it lorenzo118 at interfree.it
Sat Aug 27 15:00:30 MDT 2005


Hi,
I installed Torque 1.2.p04 on my linux cluster of 16 Pentium 4 processors (with Fedora Core 3), on master and on some node, it compiled with no problems, I started all demons and it marks as "free" all nodes. Problem is that when I launch a one processor job, I obtain only the error file (.e) that contains:


-bash: line 1: /usr/spool/PBS/mom_priv/jobs/34.medusa.d.SC: No such file or directory:


What does it mean?
Moreover, if I try to launch a two processors job (with qsub -l nodes=2 ./hello), it exits after a few seconds without writing anything (neither empty .o and .e files). While it's in queue, if I try to type qstat -f, I obtain this result:



Job Id: 36.medusa.dicea.unifi.it
    Job_Name = hello
    Job_Owner = lcampo at medusa000.dicea.unifi.it
    job_state = Q
    queue = batch
    server = medusa.dicea.unifi.it
    Checkpoint = u
    ctime = Sat Aug 27 22:53:37 2005
    Error_Path = medusa.dicea.unifi.it:/home/lcampo/hello.e36
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Sat Aug 27 22:53:37 2005
    Output_Path = medusa.dicea.unifi.it:/home/lcampo/hello.o36
    Priority = 0
    qtime = Sat Aug 27 22:53:37 2005
    Rerunable = True
    Resource_List.nodect = 2
    Resource_List.nodes = 2
    Resource_List.walltime = 01:00:00
    Variable_List = PBS_O_HOME=/home/lcampo,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=lcampo,
        PBS_O_PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/b
        in:/home/intel_fc_80/lib:/home/intel_fc_80/bin:/opt/kernel_picker/bin:/
        opt/env-switcher/bin:/opt/mpich-1.2.5.10-ch_p4-gcc/bin:/opt/pvm3/lib:/o
        pt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/op
        t/pbs/bin:/opt/pbs/lib/xpbs/bin:/home/lcampo/bin,
        PBS_O_MAIL=/var/spool/mail/lcampo,PBS_O_SHELL=/bin/bash,
        PBS_O_HOST=medusa.dicea.unifi.it,PBS_O_WORKDIR=/home/lcampo,
        PBS_O_QUEUE=batch
    etime = Sat Aug 27 22:53:37 2005



It appears there are no problems, but it simply doesn't do anything... PBS_mom is regulraly running on the master and on any node and I'm using the pbs_sched as scheduler (not MAUI).
Any idea? 
Thank you
Lorenzo Campo 

-------------------------------------------------------------------------
Visita http://domini.interfree.it, il sito di Interfree dove trovare
soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
ecco due esempi di offerte:

-  Registrazione Dominio: un dominio con 1 MB di spazio disco +  2 caselle
   email a soli 18,59 euro
-  MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email 
   a soli 51,13 euro

Vieni a trovarci!

Lo Staff di Interfree 
-------------------------------------------------------------------------



More information about the torqueusers mailing list