[torqueusers] Reservations and automated backup
sheen at usc.edu
Wed Jun 16 14:29:52 MDT 2010
I'm running a cluster for a small research group that does an automated hard
drive backup every Sunday night. The process is that one of the slave nodes
copies the NFS share directory onto a local, huge, hard drive, something
david at node3 > crontab -e
(do Sunday at 1am) cp /usr/home/ /backup
This is done on a slave node because I don't want to lose my backup if some
sort of catastrophic failure should hit the master but not the slaves (e.g.
violent PSU failure). I've found that, when the cluster is under a high
load, this backup command tends to crash the network. The obvious solution
would be to implement in Maui a reservation on Sunday from 1am to 3am.
However, this would prevent any job from running for more than a week, and
the quantum chemistry people don't like that.
Is there any way to implement a reservation system that will allow really
long jobs run while preventing the short jobs from sawrming the system
during that time?
Crossposted to torqueusers in case there is a Torque solution
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers