Originally, I was going to write a post on troubleshooting SQL Server. This is because I’m writing up and reviewing a lot of documentation at my current job. I’ll blog on troubleshooting SQL Server soon because it feels like too good a topic to pass up.
Why documentation or automation? Then you don’t feel like a cat chasing your tail every day.
I’m leaving my current job soon. There are a lot of conversations about how we do things and whether they are documented or not. We have a lot of documentation here. The DBAs here have been really good about documenting. I love documentation. I won’t always remember what I did, so it helps me, too.
But then I got to wondering, what if I had a lot more automation in place? Would this reduce the need for a lot of documentation? The kinds of documentation I’m talking about are:
- Server list – This includes server version, edition, environment, product owner names, and other info like if it’s SSIS or what products are included.
- Installation, configuration, and migration checklists – Lots of step-by-step info here and scripts.
- Auditing – For setting up auditing for DDL and security changes.
- New hire – This is a recent addition because I became the sole DBA. Now, we are hiring people. This became more pressing since I’m leaving and there is no DBA at all. This includes tips and tricks, what a new DBA needs access to, and links to other documents. It’s helping me realize where there may be gaps in documentation.
- Troubleshooting database issues – We have multiple database platforms: SQL Server, PostgreSQL, Azure SQL, Azure PostgreSQL, and Oracle. We are hiring DBAs who don’t know them all. We are also planning to have DevOps do some basic database support at some point.
- Database monitoring – We have a couple of monitoring tools – Idera and Nagios. My manager requested some more info about how these are set up.
- Permissions – How we grant permissions like AD group or role etc.
- Backups/restores – How we backup and where we test restores
- Patching – How we patch and nowadays where you approve patches in the patch manager
- Archiving – How and where we are archiving data
- Partitioning – Not a lot on this quite honestly. There is some partitioning mainly in PostgreSQL to support the archiving.
- User access issues – Mostly for Azure SQL and Azure PostgreSQL because users get confused as to how to connect
I was asked to review all the pages and make sure they are up-to-date. Also, I’m adding rationale and history to the documentation. As I was going through all these pages (we use Confluence), I asked myself, do we need all this documentation? Yes and no is what I determined.
What I realized is that it would be great if we had more automation (or at least scripts in source control) because:
- We create scripts to run things
- They are put in source control
- They have a lot of good comments so other people can understand what is being done – this includes me when I forget what I’ve done
- There may be an automated or manual depending on the script and how often it’s used
- This becomes our documentation!
A big, big, HUGE problem is that we are always short-staffed and don’t have time for automation. We especially don’t have time for large automation projects. Then there is the question of do you want to automate something you do infrequently.
For example, we don’t install/configure/migrate SQL database servers a lot. Do we really want to automate that? We don’t even create many cloud databases, but we do have some automation with Terraform and Octopus. The thought there was that we were going to create more and more of those.
There are a couple of XKCD strips that really help sum up automating for me:
Looking back at my list of documentation, I can decide what can be automated. This way I can eliminate Confluence documentation:
- Server list – I’m a fan of this manual list of information. It also includes decommissioned servers. It comes in very handy when you need to figure out who to contact about a product database. I’ve also used it to figure out where a server went if I think it should still exist.
- Installation, configuration, and migration checklists – It seems like all of this could be automated, but it’s infrequently used, so not sure there’s a case for automation. A lot of the documentation in these pages is in script form, so it could be tidied up more to make it a quick script to run.
- Auditing – This could definitely be scripted out with comments to explain the code and placed into source control. Thereby, automating it.
- New hire – This can’t really be automated. When there are more automated processes, there is less documentation to read through, though.
- Troubleshooting database issues – This also can’t be automated.
- Database monitoring – The monitoring tools themselves are tuned well and running along under their own steam. If you are new and need to know how it all works, it still makes sense to have a page in Confluence for that.
- Permissions – We have automated a lot of this already. Namely, we set up AD groups on SQL Server and created a form with a dropdown of the servers and databases. Then people use the form to request those and they are added. No manual work on the DBA part to add individual users. We do still have some one-offs. Some users require elevated perms.
- Backups/restores – Backups are automated using Ola. There is also a PowerShell that runs daily to ensure they are all working right. Restores are automated and backups are restored weekly to a restore server.
- Patching – Patching is automated already. I like to review the patches manually before they are applied. I go into the patching tool and make sure I want those patches to be applied. This requires some documentation to explain to others how I approve the patches.
- Archiving – To me this could be scripted in source control with good commenting around what you are doing.
- Partitioning – Similar to archiving, it could be scripted into source control with good commenting.
- User access issues – This feels like another case of using a Confluence page. It is provided to random users when they have issues.
Did I miss any documentation areas? Did I miss any areas for scripting/automation? I’m curious to hear other people’s thoughts. Please chime in!