DataSets and Business Logic

Whoa, that was fast. Udi Dahan responded to my post on DataSets and DbConcurrencyException. Cool. Also cool: he has a good point. Two good points, really.

Doing OLTP Better Out of the Box

I'll take his last point first because it's pure conjecture. Why don't DataSets handle OLTP-type functions better? My first two suggestions would, indeed, be better if they were included in the original code generated by the ADO.NET dataset designer. I wish that they were. Frankly, the statements already generated by the "optimistic" updates option are quite complex as-is and adding an additional "OR" condition per field wouldn't really be adding that much in either complexity or readability (which are both beyond repair anyway) and would add to reliability and reduce error conditions.

My guess is that it has to do with my favorite gripe about datasets in general: nobody knows quite what they are for. I suspect that this applies as much to the folks in Redmond as anywhere else. Datasets are obviously a stab at an abstraction layer from the server data and make it easier to do asynchronous database transactions as a regular (i.e. non-database, non-enterprise guru) developer. But that doesn't really answer the question of what they are useful for and when you should use them.

DataSets are, essentially, the red-headed step child of the .NET framework. They get enough care and feeding to survive, but hardly the loving care they'd need to thrive. And really, I think that LINQ pretty much guarantees their eventual demise. Particularly with some of the coolness that is DLINQ.

Datasets Alone Make Lousy Business Objects

As much as I am a fan of DataSets in general, you have to admit that they aren't a great answer in the whole business layer architecture domain.

I mean, you can (if you are sufficiently clever) implement some rudimentary data validation by setting facets on your table fields (not that most people do this--or even know you can). You can encode things like min/max, field length, and other relatively straight-forward data purity limitations. Anything beyond this, however, (like, say, when orders in Japan have to have an accompanying telephone number to be valid) would involve either some nasty derived class structures (if you even can--are strongly-typed DataTables inheritable? I've never tried. It'd be a mess to do so, I think), or wrapping the poor things in real classes.

One solution to this is to use web services as your business layer and toss DataSets back and forth as the "state" of a broader, mostly-conceptual object. This is something of a natural fit because DataSet objects serialize easily as XML (and do so much better--i.e. less buggy--in .NET 2.0). This de-couples methods from data, so isn't terribly OO. It can work in an environment where complex rules must work in widely disparate environments (like a call center application and a self-serve web sales application) when development speed is a concern (as in, say, a high-growth environment).

I think this leads to the kind of complexity Udi says he has seen with datasets. The main faultline is that what methods to call (and where to find them) are in design documents or a developer's head. This can easily lead to a nasty duplication of methods and chaos--problems that functionally don't exist in a stronger object paradigm.

That Said...

Here is where I stick my neck out and reveal my personal preferences (and let all the "real" developers write me off as obviously deluded): although DataSets make admittedly lousy business objects, most non-enterprise level projects just don't need the overhead that a true object data layer represents. For me, it's a case of serious YAGNI.

Take any number of .NET open source software projects I've played with: not one uses DataSets, yet not one needs all the complexity of their custom created classes, either. They aren't doing complex data validation and their CRUD operations are less robust than those produced automatically from the dataset designer. All at a higher expense of resources to produce.

Or take my current place of gainful employ. We have five ASP.NET applications that all have an extremely complex n-tier architecture--all implemented separately in each web application (and nowhere else--they're not even in a separate library). Each of the business objects has a bunch of properties implemented that are straight get/set from an internal field. And that is all they are. Oh, there's a couple of "get" routines that populate the object for different contexts using a separate Data Access Layer object. And an update routine that does the same. And a create... you get the point. It's three layers of abstraction that don't do anything. I shudder to think how much longer all that complexity took to create when a strongly-typed DataSet would have done a much better job and taken a fraction of the time. It makes me want to call the development police to report ORM abuse.

Which is to Say

Don't let all that detract from Udi's point, though. He's right that for seriously complex enterprise-level operations, you can't really get around the fact that you need good architecture for which datasets will likely be inadequate. Relying wholly on DataSets in that case will get you into trouble.

I personally think that you could get away with datasets being the communication objects between web services in most cases even so, but I also realize that there are serious weaknesses in this approach. It works best if the application is confined to a single enterprise domain (like order processing or warehouse inventory management). Once you cross domains with your objects, you incur some serious side-effects, not least of which is that the meaning of your objects (and the operations you want to perform on them) can change with context (sometimes without you knowing it--want an exercise in what I mean? Ask your head of marketing and your head of finance what the definition of a "sale" is--then go ask your board of directors).

So yeah, DataSets aren't always the answer. I'd just prefer if more developers would make that judgement from a standpoint of knowing what DataSets are and what they can do. Too often, their detractors are operating more from faith than from knowledge.*

*Not that this is the case for Udi. For all he has admitted that he isn't personally terribly familiar with datasets, his examples are pretty good at delineating their pressure points and that tends to indicate that he's speaking from some experience with their use in the wild.


21. November 2006 19:23 by Jacob | Comments (0) | Permalink

4 Solutions to DbConcurrencyException in DataSets

Following links the other day, I ran across this analysis of DataSets vs. OLTP from Udi Dahan. His clincher in favor of coding OLTP over using datasets is this:

The example that clinched OLTP was this. Two users perform a change to the same entity at the same time – one updates the customer’s marital status, the other changes their address. At the business level, there is no concurrency problem here. Both changes should go through.When using datasets, and those changes are bundled up with a bunch of other changes, and the whole snapshot is sent together from each user, you get a DbConcurrencyException. Like I said, I’m sure there’s a solution to it, I just haven’t heard it yet.

I thought about this for a minute and came up with four solutions for DbConcurrencyException in this scenario using DataSets (though the first two are essentially the same and differ only by who actually implements it). I'm sure there are others, but this should do for starters.

  1. Use stored procedures created by a competent DBA that utilizes parameters for the original and new column state. This means that you check each field with a "OR (<ds.originalValue> = <ds.updateValue>)". This solution passes the same two parameters per field as an "optimistic" pre-generated update statement but it makes the update statement larger by adding this new "OR" condition for each field.
  2. You can do the same by altering a raw update generated from the DataSet designer. This means sending a longer select to the database each update though this can be offset by setting your batch size higher if you have lots of updates you're sending (uh, you'd need ADO.NET 2.0 for that). I'd hesitate to use this method but that's mainly a personal taste issue than anything else (because I'd prefer using stored procedures and recognize that internal network traffic generally isn't the bottleneck in these kinds of transactions, though on-the-fly statement execution plan creation could be).
  3. Override the OnUpdating for the adapter to alter the command sent based on which fields have actually changed. This is probably the closest in effect to the OLTP solution envisioned by Udi. This solution is problematic for me simply because I've never actually tried to do it and I'm not sure you can hook into the base adapter updates each execution. If you can't, an alternative (in ADO.NET 2.0) would be to create a base class for the table adapters and create an alternative Update function in derived partial classes. In this case, you'd have "AcceptFineGrainedChanges" or some such function that you'd call. Once the alternative base class was created, custom programming per table adapter would be a matter of a couple moments. I've done something similar for using the designer for SyBase table adapters and it worked out pretty well. I'd have to actually try this to make sure it'd work though. Call this two half-solutions if you're feeling stern about it.
  4. This last would be useful if I have a relatively well-defined use case that isn't going to morph much or require stringent concurrency resolution. In this one, you deliberately break the one-for-one relationship from your dataset and database (i.e. one database table can be represented by multiple dataset tables). In Udi's concurrency example, the dataset would have a CustomerAddress table and a CustomerStatus table. Creating the dataset with custom selects would generate the tables pretty painlessly with appropriate paranoia. Now, this only really pushes his concern down a little, making it less likely to be an issue. It doesn't eliminate it. It'd probably handle most of the concurrency problems people are likely to run into. Or at least, push them out beyond where most people will ever experience it (not quite the same thing). It could be taken to a rediculous extreme where each field was it's own datatable (which is just silly, but I've seen sillier things happen) so a little balance and logical separation would be needed.

OLTP may seem more natural as a solution for many, but that's likely an issue of preference and sunken costs (because they've done it before and are comfortable with that solution space). It certainly isn't the only solution, though, nor is it a stumper for datasets.

Finally, I’ll add a caveat that I'm not saying that datasets are necessarily to be preferred over stronger object models. I just know that they get pretty short shrift from "real" developers in these kinds of discussions and want to make sure that the waters remain appropriately muddied. There may be a universal stumper for datasets I don't know about. There are certainly environments where a formal OLTP or ORM tool would be a legitimately preferred solution.


Technorati tags: , , , , ,
21. November 2006 05:35 by Jacob | Comments (0) | Permalink


<<  September 2017  >>

View posts in large calendar