Wednesday, May 04, 2005

Fun with DCOM

You might remember a couple of months ago I posted about the fun of getting a robot working using DCOM[^]. One of the tasks I have to achieve here in Baguio is the same miracle. Should have been easy. I'd managed to get it working with relatively little pain in Dallas. Here it stubbornly refused to cooperate.

In both cases we're invoking a DCOM object running on an XP Pro box; the client runs NT 4 SP6a. We don't have a choice about that; it's an app from another division of the company, written in VB6 and they adamantly refuse to support it unless it's on NT4 SP6a.

So I went in and configured the boxes appropriately via DCOM. I won't go into the tedious details. Fired it up and... nothing! I could see the COM component start up in task manager on the server machine; it disappeared within a few seconds. The client app reported error 462 - 'The remote server does not exist or is unavailable'. Well obviously the remote server did exist, so what was the problem?

Quick google search on that error. Read maybe a hundred different web pages. Nothing in particular leapt out; apart from an offhand comment that such an error indicates a problem in the COM object itself. Yet it worked in Dallas. But ok, I can try a few experiments. Fired the COM object up on the server box under the debugger. Put a breakpoint on the constructor. Then fire up the client app. Sure enough, it broke in the constructor. Single step through that and the ATL plumbing upstream. No errors. S_OK all the way!

It was about this time that I made a fundamental mistake. What I should have done was what I eventually did; try the same test XP Pro to XP Pro. What I did was get sidetracked into permissions. A few hours later (and a couple of visits from the customers IT people to perform tasks requiring network admin permission which I don't have) and it's still stubbornly refusing to work. It was all getting somewhat frustrating.

So I called it a day and repaired to the hotel for dinner and some wine.

The next day I went over the same set of steps. You'd think I'd know better by now. If something doesn't work repeating the same set of steps ain't going to suddenly, magically, make it work. Tried an earlier version of the COM component. Didn't work (nor did I really expect it would).

So eventually I tried what I should have tried first off. Moved the server machine next to another system we have here that also runs XP Pro. Wrote a quick and dirty VB6 program (all of 4 lines of code). VB6 shines at these kinds of quick tests. Configured the new box as the client through dcomcnfg and voila! It works!

So what do we know? It works in Dallas using NT4 SP6a client and XP Pro server. It works here using XP Pro client and server. It doesn't work here using NT4 SP6a client and XP Pro server. Gotta be a problem with the NT box.

So I rebooted the client machine. And up it came with various warning messages about corrupted files. Aha! Ran chkdsk which reported some hundreds of errors and rebooted. No more warning messages but it still didn't work. So we reverted to the ghosted image of the machine as it was originally delivered and voila! it works.

I could probably have saved myself considerable time had I simply rebooted the NT machine but that's not normally a part of my troubleshooting technique. I expect NT and derivatives to work; for the most part they do.

Now all I have to do is persuade my colleague (who left on Sunday because I couldn't get DCOM working) to return so that we can continue the task...

No comments: