Too Much InformationLike many businesses, my current employer uses Microsoft Dynamics GP for our main  accounting/inventory software. From a recent trip to Connections (the big convention for Dynamics), I learned that there were some cool data mining utilities available from Microsoft that we might be able to use. I love free stuff, and the opportunity to give our executives something to chew on for a bit (keeping them out of my hair) was one I couldn't pass up.

The Analysis Services install I managed to dig up (after much searching on Microsoft's Customer Source) went pretty easily. There are a couple of SSIS jobs included that populate an summary database with relevant data pulled from the objects you want to throw into your data warehouse. Not too shabby, even if it's an extra aggregating step. In fact, I kind of appreciate the added encapsulation and I know I'll have to restrain myself from using this database for other reports.

Anyway, the only thing left at that point was to "process" the data warehouse. That's where the data warehouse reads the "factual" information and chunks it up into nice little cubes that should have precise business-person meaning. It can take a while, so I was prepared to leave the process running a bit.

Five hours later...

We're just not so big a company that this thing should have taken five+ hours. Digging into the process that was taking so long revealed that at least one of the dimensions it was trying to process included a table join that seemed a bit fishy. It included a master-detail relationship that should have ended up with something around 500,000 rows. Checking the "Estimated Execution Plan" on the generated select statement showed that it was estimated to return closer to 2 billion rows. That's a messed-up join is what that is.

Fortunately, Visual Studio 2005 gives you some cool tools that work with Analysis Services. One of them is a wizard that you can point at a data warehouse and have it reconstruct it as a project. Checking the Data Source View "GPDataWarehouse.dsv" revealed what the problem was. Roughly half the relations in that view had split the key so that there was one for the actual relation minus the CompanyId and another with the CompanyId all by its lonesome. The effect of this is that the data warehouse would process once for the real relationship and again for the spare. Since CompanyId is the same value for all my records, what I ended up with was a data set getting ready to relate 147,698 records with every one of 11,937 records. I'll spare you digging out a calculator; that's 1,763,071,026 rows total.

Now frankly, I could have simply removed the CompanyId entirely from all relationships—CompanyId is only really relevant if your Dynamics install is trying to manage multiple companies. We aren't doing that. Still, I can be a little, uh, rigid with stuff like that so I went ahead and changed the relevant relationships to include CompanyId and simply deleted the spares. Once I deployed the project back to the data warehouse, processing it completed in a mere 15 minutes.

I suspect that the cause of this disjointed relationship is that the original warehouse was configured for SQL Server 2000 and we're using SQL Server 2005. At any rate, things are a lot happier now. I have an SSIS package that kicks off the summary database update and then processes the cube and the whole thing runs at night in 20 minutes or so.

For those who might be experiencing similar issues, I've uploaded my working Data Source View. You should be able to replace yours using SQL Server Management Studio. I selected all the options on install so it includes the Financials, Inventory, Payables, Purchases, Receivables, and Sales cubes. I'd do a backup before attempting it, but you were probably going to do that anyway...