Changing Table Names in an OR/M

by Jacob 27. August 2008 11:33

Empty Sign I spent some quality time googling this and even went and asked the nascent Stack Overflow community and didn't come up with a satisfactory answer. Being the intrepid sort, I opened up a test project and started poking around, compiling information from a number of sources and playing until I got something that worked. For your amusement and/or edification, I'll document what I found.

What I Want to Do

The basic scenario is that many typical “commodity” web applications use databases to store their information. Since most web hosting services come with a single database but charge extra for additional databases, it is common for web-type products to add identifier text to their table and stored procedure names*. The .Net blog software I sometimes contribute to, Subtext, is an example. Take the table I added to hold tags associated to posts a while back, “subtext_Tag”. Using the “subtext_” prefix means that we won't run into naming collisions on our tables if someone has a wiki or forum application that also contains a table for tag entities named “Tag”.

 

* And yes, as one user on Stack Overflow suggested, you could use schema as a differentiator so we could as easily have used a “subtext” schema and our tables would be “subtext.Tag” instead of “dbo.Tag”. That solution is much trickier to setup than a simple table naming convention, though, so I think I prefer the table prefix for now.

 

The .Net Way

While I was initial open to any .Net OR/M the fact of the matter is that the only one I know anything about is SubSonic. Now I like SubSonic a lot, but their tools are geared towards code generation and I couldn't find a handle into runtime manipulation of table names (such might exist, but I was unable to find it).

Since I'm even less familiar with the other third party .Net OR/M data tools (like LLBLGen or NHibernate) and nobody on Stack Overflow (who tend to be knowledgeable about these things) spoke up, I decided to check out what it would take to monkey with the table names at runtime using stuff I have actually used. Namely the Entity Framework and LINQ to SQL. It turns out to be possible in either, though I have to admit to being surprised that it is easier in LINQ to SQL than in EF.

My Test Database

To keep things simple, I created a test database. Since I am a highly creative professional, I named it “Test” and created a table there named “TestTable”.

Test Schema 

LINQ to SQL

Instantiating a new DataContext object in LINQ to SQL includes a constructor that allows you to feed in a MappingSource derivative. By default, L2S uses an attribute mapping object that pulls the metadata from attributes on your classes—in this case the “Name” property of a TableAttribute.

[Table(Name="dbo.TestTable")]
public partial class TestTable : INotifyPropertyChanging, INotifyPropertyChanged
 

Since attributes are immutable at runtime, using the default isn't an option here. Fortunately there's an XmlMappingSource object that can use an XML file (or fragment) to do what I need. Unfortunately, generating an initial mapping file is a touch cumbersome and requires use of the SQLMetal.exe tool provided with Visual Studio.

Here's how (after adding a TestLINQ.dbml file and dragging the table onto it—pre-name-change of course):

  1. I opened the VS Command Prompt (it's in a Visual Studio Tools folder in the start menu).
  2. Changed the directory to my project.
  3. Entered the command “sqlmetal /map:TestLINQ.map /code TestLINQ.dbml”. This generates a TestLINQ.map file.
  4. Right-clicked on the generated file in my project and selected “Include in Project”.
  5. Set the file's “Copy to Output Directory” property to “Copy Always”.

The mapping file is pretty simple. The relevant bit is the Name attribute of the Table element:

<Table Name="dbo.test_TestTable" Member="TestTables">
  <Type Name="LINQ.TestTable">
    <Column Name="TestId" Member="TestId" Storage="_TestId" DbType="Int NOT NULL IDENTITY" IsPrimaryKey="true" IsDbGenerated="true" AutoSync="OnInsert" />
    <Column Name="TestOne" Member="TestOne" Storage="_TestOne" DbType="VarChar(50)" />
    <Column Name="TestTwo" Member="TestTwo" Storage="_TestTwo" DbType="VarChar(50)" />
  </Type>
</Table>
 

Make sure the name is what you want it to be and you're golden. Here's the code I used to test it out after the name change.

XmlMappingSource source = XmlMappingSource.FromUrl("TestLINQ.map");
using (LINQ.TestLINQDataContext context = new LINQ.TestLINQDataContext(Properties.Settings.Default.TestConnectionString, source))
{
    LINQ.TestTable table = new LINQ.TestTable()
    {
        TestOne = "firstLINQ",
        TestTwo = "secondLINQ"
    };
    context.TestTables.InsertOnSubmit(table);
    context.SubmitChanges();
 
    table.TestOne = "thirdLINQ";
    context.SubmitChanges();
 
    context.TestTables.DeleteOnSubmit(table);
    context.SubmitChanges();
}
 

XmlMappingSource even has a .FromXml() method that will create your map from an Xml fragment string.

Entity Framework

As I mentioned before this one is harder. This is a surprise because EF is ostensibly created to make it easier to keep your object definitions separate from your storage definitions. The reason it isn't easier is understandable once you realize that EF is made to be highly configurable and thus its definition files are much more complex than the L2S mapping.

The first problem with EF, though, is that the documentation is schizophrenic. Also confusing. That's because MS rolled the original three configuration files into the .edmx definition file so references on the web imply that those files are easily seen and edited. Even more confusing is that EF actually still uses the .csdl, .ssdl, and .msl files at runtime—it just generates those files from the .edmx and either packs them in the assembly as a resource (by default) or as files in your output directory.

Well, to monkey with the tables at runtime, you have to have access to the configuration files. To do so you need to change the default in the “Metadata Artifact Processing” property of your ConceptualEntityModel and rebuild the project. That'll put perfectly good .csdl, .ssdl, and .msl files in your output bin directory (You don't have to leave it at "Copy to Output Directory" once you have saved these files off for your own use and abuse.)

EF did another funky thing by deviating from the norm in what it pulls from the connection string that you feed it. If you look at your EF connection string in app.config (or web.config) you'll see something like this:

<add name="TestEntities" 
connectionString="metadata=res://*/TestEF.csdl|res://*/TestEF.ssdl|res://*/TestEF.msl;provider=System.Data.SqlClient;provider connection string=&quot;Data Source=localhost;Initial Catalog=Test;Integrated Security=True;MultipleActiveResultSets=True&quot;"
providerName="System.Data.EntityClient" />
 

Notice that there's a “provider connection string” embedded in the connectionString attribute—that's the “normal” connection string that tells EF what database to use for storage. Also notice that the metadata property tells EF where to go for those configuration files (in this case, it's telling EF to look in the assembly for resources TestEF.csdl, TestEF.ssdl and Test.msl).

Armed with this information, I was able to add a set of generated config files to the project stolen from those generated in the output directory. Once there, you have to edit the storage file and the mapping file to use the altered names. The table name used by EF is taken from the Name attribute of the EntitySet element in the .ssdl file. This is unfortunate because the Name attribute is a reference used by things like associations. Which means that you have to make sure you alter all the references to that EntitySet as well (fortunately, these are generally referenced using an EntitySet attribute on the relevant association and thus are relatively easy to find).

<EntitySet Name="test_TestTable" EntityType="TestModel.Store.TestTable" store:Type="Tables" Schema="dbo" />
 

It's not necessary to alter the EntityType, though I found that if you alter it consistently across the file it will work if you do.

The mapping configuration in the .msl file just has to be updated so that the objects use the correct EntitySet for storage.

<EntitySetMapping Name="TestTable">
  <EntityTypeMapping TypeName="IsTypeOf(TestEF.TestTable)">
    <MappingFragment StoreEntitySet="test_TestTable">
      <ScalarProperty Name="TestId" ColumnName="TestId" />
      <ScalarProperty Name="TestOne" ColumnName="TestOne" />
      <ScalarProperty Name="TestTwo" ColumnName="TestTwo" />
    </MappingFragment>
  </EntityTypeMapping>
</EntitySetMapping>
 

Once those changes are made your code will work with the altered table name, though you need to alter the connection string to look for the correct configuration files. Here's the final code I used to test it out.

string connection = "metadata=TestEF.csdl|TestEF.ssdl|TestEF.msl;provider=System.Data.SqlClient;provider connection string=\"Data Source=localhost;Initial Catalog=Test;Integrated Security=True;MultipleActiveResultSets=True\"";
using (TestEntities entities = new TestEntities(connection))
{
    TestTable table = new TestTable()
    {
        TestOne = "firstEF",
        TestTwo = "secondEF"
    };
    entities.AddToTestTable(table);
    entities.SaveChanges();
 
    table.TestOne = "thirdEF";
    entities.SaveChanges();
 
    entities.DeleteObject(table);
    entities.SaveChanges();
}

So Which is Better?

Well, that depends on what you want to accomplish but then, doesn't it always? For me, the LINQ to SQL solution is much cleaner because I'm simply not going to use all the other goo that the Enterprise Framework includes. Plus, the LINQ to SQL solution can use an XML fragment so I can bury that mapping piece wherever I want to, including in inline code. EF requires a file reference so those files have to be either in the assembly resources or on the file system. EF also allows you to leverage Asp.Net Data Services but that's a topic for another post entirely...

Tags: , , , , , , ,

Programming

Arguing Data

by Jacob 27. February 2007 00:33

People have a lot of different reasons for posting blog entries. These reasons vary from financial, to personal, to professional, to I'm afraid to know more. For me, one reason I take the time when I could be doing something else is that I like to put my ideas out there to be tested. I don't really care if a majority of people agree with me so much as I want to see what other people have to say for or against certain things. The downside to this is that I'll sometimes find that an idea isn't as good as I had originally thought it was. The upside is the opportunity to refine something to be better or to discard an idea that turns out simply to be bad.

Which is why I'm glad to see Karl Seguin's response to a post I had made about DataSets. Karl's a bright guy and he has a good background in the problem domain associated with DataSet objects. He displays class, too, even when he feels I've been a bit rough in a point or two.

The School of Hard Knocks

I empathize with his experience where DataSet misuse caused much pain and suffering. I've been in similar situations and it's no fun. In a full-blown business transaction environment, DataSets have some liabilities that make them ill-suited for business-layer usage. The thing is, the opposite problem exists as well, and it's one that is more serious than people want to give it credit for: a layer of specialized, hand-crafted business objects that don't actually do anything.

I'm currently working at a place that has an extreme case of this problem. We have four entirely separate ASP.Net applications for our internal invoice processing. All four of these applications have their own set of substantially similar custom objects that are completely unique for that application. Each object doesn't do anything more than contain a group of properties that are populated from a database and write changes back to it.

I shudder to think how many hours were wasted on this travesty. It's over-complex, can't leverage any type of automated binding, doesn't track row state, and testing and debugging changes is an unmitigated pain. It's like someone attended an n-tier lecture somewhere and never bothered understanding what the point of having one actually was. Frankly, I'd prefer if the previous developers had simply put all the data access right in each individual page--at least that'd be easier to fix when something blew up.

Learning Your Craft

The thing is, my experience no more proves custom business objects wrong than Karl's experience proves DataSets wrong. That's the trouble with anecdotal experience: it feels more important than it is (it doesn't help that pain is such an efficient teacher).

The trick of learning a craft is in gaining experience that is both specific and broad. This can be tricky in a field that is as immense as software development. You really have no choice but to specialize at some point. Even narrowing it down to ".Net Framework" isn't nearly enough to constitute adequate focus for competence.

Unfortunately, Karl's point that there are a lot of lazy programmers out there is true. Anyone who has had to hire or manage programmers will confirm this. Too many developers don't bother learning enough of their craft to be considered actually competent. Faced with the need to specialize carefully, many simply give up and learn only enough to get by (and sometimes not even that much). They're content to learn the bare minimum needed to get hired. They'll learn enough of the "how" to create a program without ever bothering to learn any of the "why".

Teaching Others

I have a minor problem with Karl's explanation, though. He says, "I advocate against the use of DataSets as a counterbalance to people who blindly use them." While I understand this position, I'm not sure I can be said to appreciate it. It smacks a little of the "for your own good" school of learning; which works well enough in a parent-child or even teacher-student relationship. I'm not sure it works so well in public or general discourse.

It is hard to correct bad habits, particularly habits as widespread as DataSet misuse seems to be. As one who often has the bad habits to be corrected, though, I think that I'd prefer having the problem explained and given the context so I can understand the trade-offs being made. That would give me the opportunity to know why something is wrong, not just that something is wrong.

That'd require discussing DataSets in specific instead of general terms. I'm not sure if Karl would really want to do that, though. I mean, his specialty at CodeBetter is really ASP.Net. Expecting him to tackle ADO.Net is not just unrealistic, it could have the effect of diluting his blog posts and alienating his regular readers or getting him embroiled in things he's less interested in.

I would like to see someone respectable and wider-read than I am take on Strongly-typed DataSets in a more complete fashion, though.

Professor Microsoft

Which is why I have to agree with Karl that the blame for DataSet misuse lies squarely in Microsoft's court. I stopped counting how many official articles and examples from Microsoft included egregious misuse or abuse of DataSets. And I have yet to see any that describe how to do it right or what kinds of things to look for in determining the trade-offs between a Strongly-typed DataSet and a more formal OR/M solution, let alone ameliorating factors for each. The only articles about DataSets that I can remember that don't actually teach bad habits are articles about how bad they are. Which isn't helpful. It'd be nice to have something, somewhere that talks about using them wisely and what their strengths actually are. Maybe that should be a future blog post here...

Tags: , , , , , , ,

Programming

DataSets and Business Logic

by Jacob 23. November 2006 08:23

Whoa, that was fast. Udi Dahan responded to my post on DataSets and DbConcurrencyException. Cool. Also cool: he has a good point. Two good points, really.

Doing OLTP Better Out of the Box

I'll take his last point first because it's pure conjecture. Why don't DataSets handle OLTP-type functions better? My first two suggestions would, indeed, be better if they were included in the original code generated by the ADO.NET dataset designer. I wish that they were. Frankly, the statements already generated by the "optimistic" updates option are quite complex as-is and adding an additional "OR" condition per field wouldn't really be adding that much in either complexity or readability (which are both beyond repair anyway) and would add to reliability and reduce error conditions.

My guess is that it has to do with my favorite gripe about datasets in general: nobody knows quite what they are for. I suspect that this applies as much to the folks in Redmond as anywhere else. Datasets are obviously a stab at an abstraction layer from the server data and make it easier to do asynchronous database transactions as a regular (i.e. non-database, non-enterprise guru) developer. But that doesn't really answer the question of what they are useful for and when you should use them.

DataSets are, essentially, the red-headed step child of the .NET framework. They get enough care and feeding to survive, but hardly the loving care they'd need to thrive. And really, I think that LINQ pretty much guarantees their eventual demise. Particularly with some of the coolness that is DLINQ.

Datasets Alone Make Lousy Business Objects

As much as I am a fan of DataSets in general, you have to admit that they aren't a great answer in the whole business layer architecture domain.

I mean, you can (if you are sufficiently clever) implement some rudimentary data validation by setting facets on your table fields (not that most people do this--or even know you can). You can encode things like min/max, field length, and other relatively straight-forward data purity limitations. Anything beyond this, however, (like, say, when orders in Japan have to have an accompanying telephone number to be valid) would involve either some nasty derived class structures (if you even can--are strongly-typed DataTables inheritable? I've never tried. It'd be a mess to do so, I think), or wrapping the poor things in real classes.

One solution to this is to use web services as your business layer and toss DataSets back and forth as the "state" of a broader, mostly-conceptual object. This is something of a natural fit because DataSet objects serialize easily as XML (and do so much better--i.e. less buggy--in .NET 2.0). This de-couples methods from data, so isn't terribly OO. It can work in an environment where complex rules must work in widely disparate environments (like a call center application and a self-serve web sales application) when development speed is a concern (as in, say, a high-growth environment).

I think this leads to the kind of complexity Udi says he has seen with datasets. The main faultline is that what methods to call (and where to find them) are in design documents or a developer's head. This can easily lead to a nasty duplication of methods and chaos--problems that functionally don't exist in a stronger object paradigm.

That Said...

Here is where I stick my neck out and reveal my personal preferences (and let all the "real" developers write me off as obviously deluded): although DataSets make admittedly lousy business objects, most non-enterprise level projects just don't need the overhead that a true object data layer represents. For me, it's a case of serious YAGNI.

Take any number of .NET open source software projects I've played with: not one uses DataSets, yet not one needs all the complexity of their custom created classes, either. They aren't doing complex data validation and their CRUD operations are less robust than those produced automatically from the dataset designer. All at a higher expense of resources to produce.

Or take my current place of gainful employ. We have five ASP.NET applications that all have an extremely complex n-tier architecture--all implemented separately in each web application (and nowhere else--they're not even in a separate library). Each of the business objects has a bunch of properties implemented that are straight get/set from an internal field. And that is all they are. Oh, there's a couple of "get" routines that populate the object for different contexts using a separate Data Access Layer object. And an update routine that does the same. And a create... you get the point. It's three layers of abstraction that don't do anything. I shudder to think how much longer all that complexity took to create when a strongly-typed DataSet would have done a much better job and taken a fraction of the time. It makes me want to call the development police to report ORM abuse.

Which is to Say

Don't let all that detract from Udi's point, though. He's right that for seriously complex enterprise-level operations, you can't really get around the fact that you need good architecture for which datasets will likely be inadequate. Relying wholly on DataSets in that case will get you into trouble.

I personally think that you could get away with datasets being the communication objects between web services in most cases even so, but I also realize that there are serious weaknesses in this approach. It works best if the application is confined to a single enterprise domain (like order processing or warehouse inventory management). Once you cross domains with your objects, you incur some serious side-effects, not least of which is that the meaning of your objects (and the operations you want to perform on them) can change with context (sometimes without you knowing it--want an exercise in what I mean? Ask your head of marketing and your head of finance what the definition of a "sale" is--then go ask your board of directors).

So yeah, DataSets aren't always the answer. I'd just prefer if more developers would make that judgement from a standpoint of knowing what DataSets are and what they can do. Too often, their detractors are operating more from faith than from knowledge.*

*Not that this is the case for Udi. For all he has admitted that he isn't personally terribly familiar with datasets, his examples are pretty good at delineating their pressure points and that tends to indicate that he's speaking from some experience with their use in the wild.

 

Tags: , , , , , , , ,

Programming

4 Solutions to DbConcurrencyException in DataSets

by Jacob 21. November 2006 11:35

Following links the other day, I ran across this analysis of DataSets vs. OLTP from Udi Dahan. His clincher in favor of coding OLTP over using datasets is this:

The example that clinched OLTP was this. Two users perform a change to the same entity at the same time – one updates the customer’s marital status, the other changes their address. At the business level, there is no concurrency problem here. Both changes should go through.When using datasets, and those changes are bundled up with a bunch of other changes, and the whole snapshot is sent together from each user, you get a DbConcurrencyException. Like I said, I’m sure there’s a solution to it, I just haven’t heard it yet.

I thought about this for a minute and came up with four solutions for DbConcurrencyException in this scenario using DataSets (though the first two are essentially the same and differ only by who actually implements it). I'm sure there are others, but this should do for starters.

  1. Use stored procedures created by a competent DBA that utilizes parameters for the original and new column state. This means that you check each field with a "OR (<ds.originalValue> = <ds.updateValue>)". This solution passes the same two parameters per field as an "optimistic" pre-generated update statement but it makes the update statement larger by adding this new "OR" condition for each field.
  2. You can do the same by altering a raw update generated from the DataSet designer. This means sending a longer select to the database each update though this can be offset by setting your batch size higher if you have lots of updates you're sending (uh, you'd need ADO.NET 2.0 for that). I'd hesitate to use this method but that's mainly a personal taste issue than anything else (because I'd prefer using stored procedures and recognize that internal network traffic generally isn't the bottleneck in these kinds of transactions, though on-the-fly statement execution plan creation could be).
  3. Override the OnUpdating for the adapter to alter the command sent based on which fields have actually changed. This is probably the closest in effect to the OLTP solution envisioned by Udi. This solution is problematic for me simply because I've never actually tried to do it and I'm not sure you can hook into the base adapter updates each execution. If you can't, an alternative (in ADO.NET 2.0) would be to create a base class for the table adapters and create an alternative Update function in derived partial classes. In this case, you'd have "AcceptFineGrainedChanges" or some such function that you'd call. Once the alternative base class was created, custom programming per table adapter would be a matter of a couple moments. I've done something similar for using the designer for SyBase table adapters and it worked out pretty well. I'd have to actually try this to make sure it'd work though. Call this two half-solutions if you're feeling stern about it.
  4. This last would be useful if I have a relatively well-defined use case that isn't going to morph much or require stringent concurrency resolution. In this one, you deliberately break the one-for-one relationship from your dataset and database (i.e. one database table can be represented by multiple dataset tables). In Udi's concurrency example, the dataset would have a CustomerAddress table and a CustomerStatus table. Creating the dataset with custom selects would generate the tables pretty painlessly with appropriate paranoia. Now, this only really pushes his concern down a little, making it less likely to be an issue. It doesn't eliminate it. It'd probably handle most of the concurrency problems people are likely to run into. Or at least, push them out beyond where most people will ever experience it (not quite the same thing). It could be taken to a rediculous extreme where each field was it's own datatable (which is just silly, but I've seen sillier things happen) so a little balance and logical separation would be needed.

OLTP may seem more natural as a solution for many, but that's likely an issue of preference and sunken costs (because they've done it before and are comfortable with that solution space). It certainly isn't the only solution, though, nor is it a stumper for datasets.

Finally, I’ll add a caveat that I'm not saying that datasets are necessarily to be preferred over stronger object models. I just know that they get pretty short shrift from "real" developers in these kinds of discussions and want to make sure that the waters remain appropriately muddied. There may be a universal stumper for datasets I don't know about. There are certainly environments where a formal OLTP or ORM tool would be a legitimately preferred solution.

 

Technorati tags: , , , , ,

Tags: , , , , ,

Programming

scruffylookingcatherder.com

Information

    Recent Posts

    Calendar

    <<  September 2010  >>
    MoTuWeThFrSaSu
    303112345
    6789101112
    13141516171819
    20212223242526
    27282930123
    45678910

    View posts in large calendar
    Disclaimer
    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2010 Scruffy-looking Cat Herder