[torqueusers] ha torque
Daniel Bourque
dbourque at weatherdata.com
Wed Apr 9 15:58:27 MDT 2008
Thanks,
I'll try with a 1GB volume over DRDB since I don't have a SAN.
I'm assuming I can use that same volume for Maui files too. Since
HA-Oscar works, then that must mean that Maui is able to recover from
failure, ie the databases are not thrashed by an unclean termination.
thanks again
Daniel Bourque
Sr. Systems Engineer
WeatherData Service Inc
An Accuweather Company
Office (316) 266-8013
Office (316) 265-9127 ext. 3013
Mobile (316) 640-1024
Brock Palen wrote:
> Ours is 868MB but its all because we don't rotate out our account
> logs, Right now we have 6025 jobs, and server_priv/jobs is just 73 MB.
>
> [root at nyx server_priv]# du -h --max-depth=1
> 8.0K ./acl_svr
> 4.0K ./disallowed_types
> 796M ./accounting
> 73M ./jobs
> 4.0K ./acl_groups
> 72K ./queues
> 48K ./acl_users
> 24K ./acl_hosts
> 868M .
>
> Good luck, You really don't need more than a Gig LUN to do this.
> Maybe try DRBD. I use it for Virtualized OS's all the time, to
> mirror their partitions across hosts.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> On Apr 9, 2008, at 5:30 PM, Steve Snelgrove wrote:
>
>> On my test system, the size of this directory is 13 meg. However,
>> this does contain the jobs sub-directory and thus the size will vary
>> depending on how many jobs are running.
>>
>> root# cd /var/spool/torque
>> root# du -h server_priv
>> 652K server_priv/jobs
>> 4.0K server_priv/arrays
>> 8.5M server_priv/accounting
>> 4.0K server_priv/disallowed_types
>> 20K server_priv/acl_svr
>> 4.0K server_priv/hostlist
>> 4.0K server_priv/acl_hosts
>> 4.0K server_priv/acl_users
>> 4.0K server_priv/acl_groups
>> 8.0K server_priv/queues
>> 13M server_priv
>>
>> I am not qualified to give an opinion on Maui or the scheduler, sorry.
>>
>> Daniel Bourque wrote:
>>
>>> thanks
>>>
>>> how much disk space does /var/spool/torque/server_priv typically use ?
>>>
>>> how about the maui scheduler ? should it be running on both
>>> headnodes, trying to communicate with localhost ?
>>>
>>> I'm a little confused by the example, where the scheduler runs on
>>> the the hosts as pbs_mom and not pbs_server... is the intent to
>>> also failover the scheduler along with the shared file system ?
>>>
>>>
>>> thanks again.
>>>
>>> Daniel Bourque
>>> Sr. Systems Engineer
>>> WeatherData Service Inc
>>> An Accuweather Company
>>>
>>>
>>>
>>>
>>> Steve Snelgrove wrote:
>>>
>>>> The 2.3 release of Torque has support for HA by allowing two head
>>>> node server to access the server_priv files on a shared file
>>>> system. See http://www.clusterresources.com/torquedocs21/4.3high-
>>>> availability.shtml for more details.
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>
More information about the torqueusers
mailing list