[torqueusers] ha torque

Daniel Bourque dbourque at weatherdata.com
Wed Apr 9 15:58:27 MDT 2008


Thanks,

    I'll try with a 1GB volume over DRDB since I don't have a SAN.

    I'm assuming I can use that same volume for Maui files too. Since 
HA-Oscar works, then that must mean that Maui is able to recover from 
failure, ie the databases are not thrashed by an unclean termination.

thanks again

Daniel Bourque
Sr. Systems Engineer
WeatherData Service Inc
An Accuweather Company

Office (316) 266-8013
Office (316) 265-9127 ext. 3013
Mobile (316) 640-1024



Brock Palen wrote:

> Ours is 868MB  but its all because we don't rotate out our account  
> logs,  Right now we have 6025 jobs,  and server_priv/jobs is just 73 MB.
>
> [root at nyx server_priv]# du -h --max-depth=1
> 8.0K    ./acl_svr
> 4.0K    ./disallowed_types
> 796M    ./accounting
> 73M     ./jobs
> 4.0K    ./acl_groups
> 72K     ./queues
> 48K     ./acl_users
> 24K     ./acl_hosts
> 868M    .
>
> Good luck,  You really don't need more than a Gig LUN to do this.   
> Maybe try DRBD.  I use it for Virtualized OS's all the time, to  
> mirror their partitions across hosts.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> On Apr 9, 2008, at 5:30 PM, Steve Snelgrove wrote:
>
>> On my test system, the size of this directory is 13 meg.  However,  
>> this does contain the jobs sub-directory and thus the size will  vary 
>> depending on how many jobs are running.
>>
>> root# cd /var/spool/torque
>> root# du -h server_priv
>> 652K    server_priv/jobs
>> 4.0K    server_priv/arrays
>> 8.5M    server_priv/accounting
>> 4.0K    server_priv/disallowed_types
>> 20K     server_priv/acl_svr
>> 4.0K    server_priv/hostlist
>> 4.0K    server_priv/acl_hosts
>> 4.0K    server_priv/acl_users
>> 4.0K    server_priv/acl_groups
>> 8.0K    server_priv/queues
>> 13M     server_priv
>>
>> I am not qualified to give an opinion on Maui or the scheduler, sorry.
>>
>> Daniel Bourque wrote:
>>
>>> thanks
>>>
>>> how much disk space does /var/spool/torque/server_priv typically  use ?
>>>
>>> how about the maui scheduler ? should it be running on both  
>>> headnodes, trying to communicate with localhost ?
>>>
>>> I'm a little confused by the example, where the scheduler runs on  
>>> the the hosts as pbs_mom and not pbs_server... is the intent to  
>>> also failover the scheduler along with the shared file system ?
>>>
>>>
>>> thanks again.
>>>
>>> Daniel Bourque
>>> Sr. Systems Engineer
>>> WeatherData Service Inc
>>> An Accuweather Company
>>>
>>>
>>>
>>>
>>> Steve Snelgrove wrote:
>>>
>>>> The 2.3 release of Torque has support for HA by allowing two head  
>>>> node server to access the server_priv files on a shared file  
>>>> system.  See http://www.clusterresources.com/torquedocs21/4.3high- 
>>>> availability.shtml for more details.
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>


More information about the torqueusers mailing list