I dodged a very large (metaphorical) bullet yesterday. I’m not in the habit of trying to find bullets to dodge, but this one found me because it tuns out I didn’t plan as well as I thought I had.

It all started yesterday morning when a client asked me to make some batch updates to the data in his MySQL database. So, I wrote a script to make the required updates. Before running the script, I made a backup of the database in case something went wrong. Everything was good so far. I ran the script, checked the database to see if anything looked amiss and called it a night (oh yeah, this was like 11:00pm last night). This morning, I get an urgent text message from the client saying that the data didn’t look quite right. No problem I think, I made a backup copy of the database last night, I can fix this. But nooooo, that database backup wasn’t around any more. Why, you ask? Oh yeah, that. I saved it to the same location that the regular, nightly backup saves to. So by the time I needed the backup this morning, it was already gone. Luckily, the eventual fix was very simple once I figured out where my script had gone wrong. It could have been really ugly though.

So, here’s to lessons learned:

  • Don’t do production database work late at night. The root of the problem was that I misread the client’s original email and wrote my script incorrectly because of it. Part of that failing was the late hour and long day that preceded it.
  • Plan, plan, plan and re-plan. Thinking my one backup was sufficient was my second mistake. When working with production data, you can absolutely not be “too backed up”
  • Test your scripts on non-production systems first. This client doesn’t have a dedicated development system, but I do have a few-weeks-old copy of the database on my development machine. Another mistake I made was being lazy and not testing these changes on my local copy before applying them to production
  • Have a plan in place to verify the database changes as soon as they’re made. My cursory inspection of the data didn’t reveal the actual problems that occurred. If I had done this when the client was available to test the changes, we’d have been able to immediately restore the database and assess what went wrong.
  • Nothing is ever as simple as it seems. These updates seemed like a walk in the park and lulled me into not taking the precautions I knew I should have

So, in the end, my client has his data back where it belongs and I have a few things I’ve learned I’ll never do again. My biggest embarrassment in this whole thing is that I consider myself a senior developer. I’ve been writing web apps for almost 10 years across Oracle, MS SQL, MySQL and PostGres databases and I should have known better than to take shortcuts. My only excuse is that I’m human and I can only try to learn from my mistakes.

2 thoughts on “Database bullet dodging and lessons learned

  1. What’s even more fun is when you mod a system and the users don’t catch any problems in QA testing and then a few months later you get to write SQL code to undo a data centric problem caused by your mod!

    We’ve all done things like this. The only thing that I can tell you is that if it is only an update on a few tables that in addition to a backup I will often export the table data of the affected tables into a format that I can either easily re-import or even code against if things go bad. I’ve been at this for 12 years and something very similar happened to recently – don’t beat yourself up!
    Or maybe we both just suck!

Leave a Reply

Your email address will not be published. Required fields are marked *