[torqueusers] Empty output/error log file

FyD fyd at q4md-forcefieldtools.org
Fri Mar 25 06:21:37 MDT 2011


Michael,

>> /dev/sdb1     ext3    917G  414G  457G  48% /scratch
>>
>> as you can see the /scratch partition is not full...
>
> still, those 1M files might be the problem.

I can requested 256 000 files (smaller grid) instead of 1 000 000 &  
the same problem
appends...

> Are those files temporary or do they belong to the result set?

Yes

> Will they be copied back to your head node and deleted afterwards?

No

> I guess you use /home with nfs to get your results back?

No otherwise (because of NFS) it will take for ever...

> Please check "df -i", too. A good way to exclude inode problems is to
> run 8 jobs and issue "df -i" during computation.

ok

[xxxx at node2 ~]$ df -i
Sys. de fich.         Inodes   IUtil.  ILib. %IUti. Monté sur
/dev/sda3             767232   93248  673984   13% /
/dev/sda5            59211776      15 59211761    1% /tmp
/dev/sda1              26104      41   26063    1% /boot
tmpfs                1537806       1 1537805    1% /dev/shm
/dev/sdb1            122109952 80024695 42085257   66% /scratch
master0:/home        5859342208 1326970 5858015238    1% /home
master0:/usr/local   7285856  166312 7119544    3% /usr/local
master0:/opt         3840192   27564 3812628    1% /opt

    ---

Here is what we guess:

When the 8 first jobs are started all goes well. Then, among these 8  
jobs, one will finish first while the 7 others still write on the  
common hard drive (/scratch partition), making the hard drive very  
busy; So a core is freed and the 9th job can be ran. However, for a  
reason we do not master, nothing is done for this 9th job & an empty  
error log file is generated.

We suspect that our hard drive (/scratch partition) is busy and the  
'system' does not 'answer' when PBS send the 9th job.

Does it make sense to you?

regards, Francois




More information about the torqueusers mailing list