Thursday, March 31, 2005

The day from hell

was today. You'll remember that I had to do an off-cycle software release last Friday. That release contained the stuff that I did last week in Dallas for the first three days of my time there.

Late on the Friday evening I received an email detailing some changes that had been made to our runtime by another developer that hadn't made it into our source safe database. Naturally my software release didn't contain those changes but the CD's had already been created and dispatched. A flurry of emails went between here and Chengdu in China, where my colleague was waiting for his US visa to return. I was able to establish that the missing changes would only affect one instance of our hardware so, making an executive decision, I held off doing anything until Monday. Came the Monday and I was unable to speak with our customer contact. Ok, so comes Tuesday, yesterday. I explained the situation and he wanted a second off-cycle software release that would include my changes from last week and the other changes.

So I did a second off-cycle software release and it was despatched to the appropriate locations. So far so good; we have a hole in our process that needs to be plugged but at least we've been honest with our customer and let them know of a problem before it bites em.

So today my colleague appeared in the office. I need to be careful how I word this; he reads my blog and I don't think what happened is entirely his fault; some of the blame attaches to me. I should also state that my colleague is a lot younger and less experienced than I am; things that would automatically raise a red flag to me won't to him; as they wouldn't have to me when I was at his level of experience.

I had 'the talk' with him about source code control and how important it is to ensure that if one makes changes they make their way into our codebase. Likewise the lecture on communication; if I'm in France and he's in the Philippines and we're both making changes to the same module... yada yada, you know the deal.

And he asked me if I'd made that change. I hadn't and a sinking feeling fell over me. A quick check of the database schema and sure enough I hadn't noticed 4 new fields on a particular table.

So now the alarm bells are ringing. I've sent out two off-cycle releases within 4 days and it's still not right. A quick check and I'm convinced that this is a big problem. So I called our primary contact at the customer site. Explained the situation and he said 'is there a knife handy?'. 'No mate, if there was you'd have to fight me for the chance to slash your wrists.'

So I sent out a third off-cycle release today.

Fortunately the error involves a database schema change. A while ago I wrote a very simple script driven utility that takes a bunch of SQL DDL statements and applies them to the database; that utility has been a part of every CD release we've done so I know it's out there on each instance of our hardware. I was able to send via email a simple script and instructions on how to use yesterdays CD followed by that script to bring the database up to date.

So THIS particular mistake is easily corrected but at what cost? If I were the customer and I was bombarded with not one, not two, but with three off cycle CD's, the last two separated by a single day, I'd be very nervous. The first off cycle CD is ok; that was demanded by the customer. But the second and the third were the result of a lack of communication between developers in our company.

Should I have noticed that database schema change? Yes and no. I'm supposed to be the software manager in addition to being a developer. So maybe I should have compared each table schema in the database with the schema in each MFC class. On the other hand, if someone changes the schema surely they should communicate that change to the rest of the team?

It comes down to me though; as software manager I need to keep on top of what my team is doing and ensure that communication takes place. That's why I get paid the pittance.

No comments: