This is going to be a multipart set of posts about things that someone who is administering Dynamics GP should care about when it comes to their SQL Server databases and server.
The first thing is backup and restore, but actually more about restore.
I am not going to focus on the backups themselves. There is enough documentation about SQL Server backup and the different options available both within SQL Server and third party backup applications that I am not going to go over them here (at least not yet).
Instead I am going to ask some basic questions you should be able to answer about your backup and recovery plan and recovery process.
How much data can you lose in terms of time for it to be an inconvenience, a problem and a situation where someone has to update their resume?
Normally the answer to this question is a function of time such as… Losing an hour is an inconvenience a day is problem and longer than two days is a career limiting move.
Can you answer this question? Can you backup your answers with data and management buy-in. It’s one thing for you to think that losing two hours of everyones work in Dynamics GP is not a problem it is another thing to have management agree with that. Establish those numbers with the relevant stakeholders and have them buy into them.
When is the last time you tested a restore? How long did it take to run?
A backup plan is great, but you need to test restores of the backups, you don’t have to test every backup but you need to test often enough that you feel comfortable about them.
Just testing a restore is not enough, you also want to get an idea of how long a restore is going to take. Why? If something bad happens being able to provide a reasonable estimate of how long it is going to take to restore a database is something that will be quite handy. If you can say “when we tested a restore last week it took 2 hours” then you reduce the odds of someone hovering over your next in 20 minutes asking if it is done.
Also in general practicing a restore is a good idea for the simple reason it keeps you in practice of how to do restores. When you have to restore prod after an issue, you are going to be more nervous, more prone to make mistakes, it is human nature for stress to have an impact on personal performance, that is what makes it, well stress. Heck I have been working with SQL Server for something like 18 or 19 years now and I still get nervous restoring after a prod issue. Practice helps you work through the nerves and the last thing you need is to appear you don’t know what you are doing on a restore with a boss breathing down your neck.
Is the knowledge of how to restore a database a single point of failure?
If it is, fix that, either have more than one person who knows how to do a restore (and has the system rights to do so) or make arrangements with a consultant or someone to provide cover.
Even if Joe never takes a vacation (a bad practice for anyone who deals with financial data btw) he can get the flu.. Make sure more than one person knows how to do a restore.
Do you have a way of being notified when a backup fails?
Regardless of how you take backups, do you have something set up that lets more than one person know (see Joe reference above) that there has been a backup issue? The last thing you want to find out is that you haven’t taken a backup for a week because the backup disk was full, or the tape drive had an issue. The key on this one is active notification, that is someone gets an e-mail or a page, not that someone has to check every day.
Well that is part one…