[torqueusers] "cp command failed" - case closed

Zhiliang Hu zhu at iastate.edu
Thu Dec 13 16:47:18 MST 2007


Thanks for all who have helped - 

Finally I found the cause of the error that was totally irrelevant: some rare input data was the mysterious devil, while the mpiblast was printing misleading errors.

The case should be closed -- maybe this thread should be deleted from the list archive?

Zhiliang


At 01:27 PM 12/13/2007, Garrick Staples wrote:
>On Thu, Dec 13, 2007 at 05:49:00AM -0600, Zhiliang Hu alleged:
>> At 10:56 PM 12/12/2007, you wrote:
>> 
>> >On Wed, Dec 12, 2007 at 10:57:59PM -0600, Zhiliang Hu alleged:
>> >> Let me re-phrase this problem --
>> >> 
>> >> 1- I can "qsub" to run a "hello" program,
>> >> 2- I can run "mpiblast" with a script,
>> >> 3- but when combine the two I encounter a weird problem:
>> >> 
>> >> > qsub -l nodes=6:ppn=2 mpiblast.sh
>> >> -- Where "mpiblast.sh" contains:
>> >> ----------------
>> >> #!/bin/bash
>> >> /opt/openmpi.gcc/bin/mpirun /usr/local/bin/mpiblast -p blastn -i /home/zhu/tests/mpiblast/datain4 -d bta.genome.chr
>> >> ----------------
>> >> 
>> >> Now it complains (in the torque output file xxxx.e96):
>> >> 
>> >> ----------------------
>> >> cp command failed!
>> >> command: cp /raid/pub/ncbi/blast/mpidb/bta.genome.chr.007.nhr /scratch/tmp/bta.genome.chr.007.nhr
>> >> source = /raid/pub/ncbi/blast/mpidb/bta.genome.chr.007.nhr
>> >> dest = /scratch/tmp/bta.genome.chr.007.nhr
>> >> ret_value = 32512
>> >> ----------------------
>> >> 
>> >> Any idea what could this be?
>> >
>> >It doesn't look a torque error.  Is that coming from mpiblast, or your script?
>> >Do both of those directories exist on the compute node?
>> 
>> and 
>> 
>> At 11:06 PM 12/12/2007, Chris Samuel <csamuel at vpac.org> wrote:
>> 
>> >Looks like an application error message rather than a Torque error.
>> >
>> >It appears that blast is trying (and failing) to stage some files from 
>> >your RAID system to local scratch - but it's not saying why..
>> 
>> 
>> Yes indeed that's what appears.  Thanks for hints on mpiblast -- but 
>> the mpiblast.sh script works fine on its own.  I manually checked 
>> folders, permissions, ssh, etc on all suspected directives and 
>> everything appears fine as before.  As a matter of fact there are 
>> more than 10 similar files in the same location that got copied over,
>> no problem, so it appears "weird".
>
>It's too bad it doesn't print the actual error message, because a return value
>of 32512 is meaningless.  You may want to bring this up with an mpiblast list.
>
> 
>> That's why I ask here -- (my first 3 lines in the post) -- any possible
>> known conflict when a working script and working qsub put together?
>
>Well, obviously the environment can be different.  Put a 'set' or 'setenv' at
>the top of your batch script to compare with your shell.



More information about the torqueusers mailing list