SQL server Disaster Recovery Options
What is disaster recovery DR and HA high availability HA?
Both are best option for the business to minimize their data loss and downtime. The SQL server has a number of native options. It is depends upon your recovery time objective RTO and recovery point objective RPO.
That is how much data can you afford to lose? How long can you wait? How much money do you ready to spend?
If you have answers for these questions then, you can choose your best DR plan.
Who knows what will happen tomorrow?
I have seen some of the company don’t have at least simple DR setup, Like keep the backup in safe DR site. The reason is they didn’t face any disaster yet.
High availability is a solution to have a database online all the time 99%. HA and DR is combination to build best HADR you need to understand of the infrastructure. Being a DBA, We must understand of the Infra like, How the data center/site been designed – How many buildings in a data center and how the SQL been hosted in virtual ESXI and physical, storage etc.
Main points to consider to setup HADR:
- Data center/ site overview
Have overview of data center and site that how it is designed.
- Distance of the data center and network bandwidth
Understand the distance of DC and note the network bandwidth between sites
- Number of buildings in a DC
Understand, how many buildings are physical located in each DC and how the ESXI/Physical machine been configured
- How the storage separated and Network connected etc
Understand how the storage separated and vendor name (EMC, VNX, Dell compellent) how many are connected into ESXI/Physical server
- ESXI host level configuration for SQL server
Understand, how the EXSI for SQL server licensed and how it is been placed in a site. Ex: if two esxi for SQL licensed in a DC that can act as HA and DR can be configured to other site. If both are coming from same storage attest ask your VM admin to create a LUN from two different storage. Have seen storage full issue when all comes from single storage (OR) manual cabling disconnect from someone etc.
If needed, create affinity rules in the ESXI with your VM admin.
- What SQL technologies to choose – Always ON, Clustering, Mirroring and Log shipping
Make sure to understand the requirement from customer and plan what technologies to use with the given budget and build server accordingly.
- How many servers needs to host a database in a primary side and DR site (Maybe three sites)
Understand how can servers needed for each site, based on the requirement
- Configuring right quorum that primary site should be up in case of any connectivity issue
Choose right quorum in case of any connectivity loss, at least primary site should be available to serve user request.
- Configure monitoring scripts that checks health check of your HADR and synchronization lag etc
- Try to have everything separated physically and dedicated, this not possible all the time, since DC and other components are already designed in a organization.
Truncating and deleting the ‘critical tables’ also consider a big disaster. So guys keep the backup, backup and backup the database. Before going to set-up your DR plan asks the questions to DR Guru’s via online (Like twitter, SSC and MSDN) and also read the DR Guru Paul Randal’s white papers it will help.
I have recommended to watch the brentozar‘s DR video. Its amazing video and it covers the basics DR plan.
For understandable I will classify the disaster recovery (DR) into three types. I am not going in depth just I will cover the basics of high availability HA and disaster recovery DR.
1. Data center disaster
2. Server(Host)/Drive (Except shared drive) disaster
3. Database/Drive disaster
Backup restore is common method its simplest (less expensive) and good option for disaster recovery but it’s depends upon your backup strategy and recovery model. The backup restores will work for all the above three DR methods. The DBA job is keeping the data as much as safe and application availability (Zero down time). That is test and keep your database backups safely try to automate the backup test and move them automatically to the DR site.
The backup restore does not good choose for big databases (TB) in case of any disaster, because the restoration takes more time to bring the database online. However you can speed up the backup restoration using the following SQL native features.
- Instant file initialization is the best option to bring the database online as quickly as possible.
- SQL server 2008 has backup compression technology. It helps to minimize the down time i.e. minimizes the backup and restoration time.
Clustering is a combination of one or more servers it will automatically allow one physical server to take over the tasks of another physical server that has failed. Its not a real disaster recovery solution because if the shared drive unavailable we cannot bring the database to online.
Clustering is best option it provides a minimum downtime (like 1 minutes), since it is connecting a common storage i.e SAN, the database can be online without much in recovery.
- Clustering needs extra hardware/server and it’s more expensive
- It needs a SAN storage
Always ON clustering
Always on clustering is a combination of windows cluster and database mirroring. By using this we can achieve both HA and DR.
- Windows cluster is must.
- Needs database to be full recovery mode.
- Secondary database cane be read only for reporting.
- Storage can be dedicated
Database mirroring introduced in 2005 onwards. Database Mirroring maintain an exact copy of a database on a different server. It has automatic fail over option and mainly helps to increase the database availability too.
- Database mirroring only works FULL recovery model.
- This needs two instances.
- Mirror database always in restoring state.
For more details read Robert L. Davis’s Pro SQL Server 2008 Mirroring book.
Log shipping is the process of automating the full database backup and transaction log on a production server and then automatically restores them on to the secondary (standby) server.
- Log shipping will work either Full or Bulk logged recovery model.
- You can also configure log shipping in the single SQL instance.
- The Stand by database can be either restoring or read only (standby).
- The manual fail over is required to bring the database online.
- Some data can be lost (15 minutes).
Peer-to-Peer Transactional Replication
Peer-to-peer transactional replication is designed for applications that might read or might modify the data in any database that participates in replication. Additionally, if any servers that host the databases are unavailable, you can modify the application to route traffic to the remaining servers. The remaining servers contain same copies of the data.
- Peer-to-peer replication is available only in SQL Server 2005
For more looks the Microsoft link: http://msdn.microsoft.com/en-us/library/ms151196(v=sql.90).aspx
Note: Database mirroring, logs shipping and replication also treated as DR if it’s located in different data center. We can use both the HA and DR combination to set up the disaster recovery.
Again everything is “It Depends”. Read Paul Randal’s white papers before going to configure the DR.