Data layer, the right way.

The goal with this post is to give you a better understand about how you can design your data layer and why it’s important to create a complete abstraction layer.

Whats wrong?

From the very beginning the applications has been data centric. That is, the database has been the very heart of the applications. Everything that was built was (and in some means still is) just a glorified administration interface for the database (especially CRUD applications). To make things worse, some still believe that stored procedures are essential and that parts (if not all) business rules should be stored in them.

To date most tables in the database are modeled according to 3NF (the third normal form). We get a clean database with very few repeated records.

The only thing that really has happened is that the database has been hidden behind an OR/M and we say that our applications is using classes modeled after the business (since the OR/M generate classes & mappings for us).

Building an application

Let’s start by examining the typical creation of an application (reminds me of a quote that I read a long time ago: The process of building application is like the making of sausages: It should never be watched).

Typical problems


Here are some typical problems which can surface when you are using an OR/M without proper abstractions.

  • Have you ever struggled with a LINQ query which just doesn’t want to work?
  • Poor performance due to lazy loading
  • The generated SQL turns out to be slow
  • Breaking changes in new version of the OR/M
  • Complex and/or highly coupled models thanks to all relations in the db

What is an abstraction?

Wikipedia defines “abstraction” like this:

Abstraction is the process by which data and programs are defined with a representation similar in form to its meaning (semantics), while hiding away the implementation details. Abstraction tries to reduce and factor out details so that the programmer can focus on a few concepts at a time

So unlike what many believes, you do not create abstractions just to be able to switch OR/M or database engine. You create it to reduce complexity.

Before using an abstraction you have to remember all details for both layers when coding. The amount of information that you have to remember depends on the kind of tool/library you use in your data layer. For OR/Ms you have to remember how the data layer handles lazy loading, how to specifically eager load items and how each entity has been configured. For ADO.NET you have to remember the table structure and all data types. If you use your OR/M you have probably auto generated all entities. And by doing so you have forced yourself to use the data model as your business model. If you have used an OR/M for a while you have probably noticed that it’s not the best thing for anything but trivial applications.

So before using an abstraction you have to look at your code like this:


By using an abstraction you don’t have to care about how your OR/M or database engine works. All you need to know is that the data layer has a set of classes and methods which you invoke to fetch and store information.


Your OR/M is not an abstraction

If you use your OR/M as the data layer, your probably have a lot of code like:

public void Summarize(int userId)
    // loading from db
    var carts = _dbSession.Query<ShoppingCart>().Where(x => x.UserId == userId);

    // some business logic
    var summary = 0d;
    foreach (var cart in carts)
        summary += cart.Total;

    return summary;

It looks simple enough.

But how do we test it?

We have to make sure that it calls the DB with the correct set of parameters, right? Mocking LINQ providers is not fun, so that’s out of the question. We can create a fake memory DB by using SQLite or similar. But that doesn’t guarantee that everything works with the production db (SQL Server or whatever).

How about the mappings? To guarantee that everything works we have to do integration tests. That is to use real queries against a real DB.

So it might look like the method is only responsible of doing the business logic. But it’s in fact also responsible of reading the data in a correct way too. The OR/M is just a convenience layer which loads the information from the DB for us. It doesn’t abstract away the data access. Any change to the OR/M or the database WILL affect every piece of code that uses the OR/M and the database.

Our requirements will always change. Those changes will always affect your code. By using abstraction it’s easier to handle those (since you do it in one place at a time).

Using a correct abstraction

Let’s instead change the code to:

public void Summarize(int userId)
    // loading from db
    var carts = _shoppingCartRepos.FindForUser(userId);

    // some business logic
    var summary = 0d;
    foreach (var cart in carts)
        summary += cart.Total;

    return summary;

Now we are using an abstraction. The actual data access is hidden. We do not have to know OR/M implementation details like mappings, eager/lazy loading or OR/M specific LINQ statements. We just know that we should call the FindForUser method to get the data that we need.

We can now test the method easily. The only tests we need for the repositories is integration tests to make sure that all statements are working correctly.


Complete abstractions do take a longer time to write. But ask yourself. What will you spend most time on? Maintaining the application or writing it from scratch? Let’s say that the application got an expected lifetime of four years and that it takes one year to write it. Then you got three years in which you should maintain it. Do you think that those three years will be easier or harder with abtractions? It might be worth to spend some time extra in the beginning…

The right way

Hopefully you have now understood why we need to create some kind of abstraction from the database and that the OR/M isn’t it. But that is only part of the solution. Ask yourself: Why are you creating the application? Is it to get a nicely modeled database that your application has to follow? Or is it to solve the problem that the client has?


Brad tells us: [haiku url=”″]

The database

The crazy thing is that we have let the database (as in RDBMS) dictate our applications way to long. And we have modeled our databases in the same way (using the normal forms) since 1971. That’s 42 years you lazy coders (I used a calculator =)). That’s INSANE!

It’s time to end the madness. Dare to say no. The database IS just storage.


If that’s not helping. Listen to Queen Elisabeth (no disrespect meant).

[haiku url=”″]

Is the database all bad then? No. But we have to mentally switch from letting it control our application to just be a utility in which we store our information. How can we do that?

Model the domain first

One thing has been mentioned before, and that is to model the domain / business first and then the database. We create a set of classes like Person, ChatMessage etc in the first iteration. We then test them and demonstrate the application to the client.

After that we add the representations of those classes in the database. That is, we model the DB iterative too.


Normalization is not all good. Storage is cheap today. Think twice before you normalize if normalization would increase the complexity in your application.

If you for instance only store the shipping address in the DB (i.e. you do not search or index it) there is really no need for a separate table / relation. Store the address in a ShippingAddress column of the Orders table. (The user can even have a ShippingAdress column in the User table, which you copy it from when the Order is placed).

SRP applies to the DB too

Your application may have different target audiences who do different things in the application. For instance accounting takes care of all invoices while shipping takes care of packaging and sending of the packages. And finally you have the users who places orders.

In a situation like that it would be tempting to create a Users table which contains all fields for the accounting employees, the users and the shipping employees. A more subtle approach would be to let the user table represent the application users and then create a table AccountingUsers with a FK to the Users table.

Those approaches increases complexity and indirectly makes all persons in the system dependent of changes to the Users table.

It’s far better to create three different kind of person tables with no relation between them. You might even go so far as creating different databases. One for shipping, one for accounting and one for the core application. We do use code to make the different parts communicate with each other, right? No need for a single database.

Replace it

There are several schemaless (NoSQL) databases out there today. Many of them have matured enough to be used in production. If I can choose I would in most cases pick a NoSQL db. It lowers the friction between storage and code a lot.


Nothing says that a single database solution (be RDBMS or NoSQL) are the right tool for the job. NoSQL might be a good fit for shipping while accounting should use RDBMS so that they can do their analytics.


Since the data source(s) are abstracted away we get a lot more freedom in our applications. We don’t have to try to make our applications fit the database but can instead fit database after our application.

Building applications

So if the database is not the thing that we should start with, what should we start with? How about something like this:

You have hopefully got something to think about now. You don’t have to go about and change how you build your applications, but hopefully you think twice before choosing to go down a specific path.

The data access alternatives

When working with the database in the data layer we have in fact several approaches to choose from. The most basic approach is ADO.NET and the OR/Ms are by far the most complex solution. So why is it that the OR/Ms have gained so much momentum? It’s because they generate a lot of stuff for you. You are lazy. We all are. The thing is that what we gain in the beginning may be lost during the road. That’s why you need to know about the alternatives.

This blog post became a bit larger than I intended. I’m therefore moving the alternatives to seperate posts. They are however:

  • Vanilla ADO.NET
  • Data mappers
  • OR/M
  • NoSQL

Data layer patterns

There are also some patterns which can be used for the data layer (or involves the data layer). Those that I recommend are:

  • Repository pattern
  • Query object
  • CQS (Command/query seperation)
  • CQRS (Command/Query Segregation Principle)

Those will also be discussed in a separate post.

  • Frisian

    The ShippingAddress example could be taken further to explain the virtues of a dedicated data layer:

    Applications come and go, but data stay. Normalization reflects that notion by structuring domain data in way that makes it easier to replace the accessing application. A normalized and well-named schema should ideally provide most information about the entities, properties, constraints and relations of the underlying domain data.

    It would be the responsibility of a dedicated data layer to be the bridge between a 3NF schema and an object-oriented domain model.

    So, to go back to the ShippingAddress example, it would be an entity in the database, but a simple value object in the application’s domain model. The data layer does all the necessary plumbing to hide the entity nature of the ShippingAddress from the application.

    On the query side, where reading data in a de-normalized way is the norm rather than the exception, the data layer allows for optimized access without affecting the other layers.

  • S Krupa Shankar

    Nice and insightful article (not only just this one, many other posts in your blog, too). I was not aware about most of the things you have mentioned, hence my query: Are not denormalization and SRP (for database tables) mutually exclusive?

    • Can you elaborate on what you mean with mutually exclusive in this case?

  • Adam

    Very nice article.
    I have a comment though. I’m not very good in architecture so please just take this as a question.
    I like the idea that we don’t have to adapt the application to the DB but having the DB to be adapted to the application seems not a good idea. What happens if your database will be used by another application? we can’t adapt it to the 2nd one. The ideal is that the application and the DB are designed dependently and creating a layer which plays the adaptation role.

    • In my experience it’s seldom a good idea to let more than one application modify a database. The reason to that is that you can never know which application modified which row.

      That is never a problem in the beginning, but as applications grow the likelihood of bugs increases. What do you do if you find data in your DB which do not follow your business requirements? How can you find out how the data got there?

      If you only have one application per database it’s much easier to find and fix those problems. If you need to share data use it through some form of inter-process communication like WCF or messaging.

  • Ricardo Gomez

    Hello, is there any chance I can work for you for free so I can learn from you?
    Greetings from Yucatan, Mexico. The Mayan Land. 🙂

  • Jax

    I agree with a lot of what you say, however, saying that a data technology is simply a bucket in which we store something brings up the typical expression of “I have a hammer, therefore everything is a nail”.

    If I have set based operations that need to be performed and my data is in a RDBMS, why would I not use the database to express those operations since that is what they do best. In order to express those in a tech like SQL server, generally speaking, a stored procedure should be used to encapsulate the logic.

    I can STILL abstract away the operation without trying to “hit the nail” with a “.NET screwdriver”.

    • I said it to provoke. I’ve been programming for over 20 years and most of the time devs tend to build the database first. It’s a lot faster then building the application and it gets them a sort of a visual model of the application. By saying that I I want my readers to reflect before they do the same thing again.

      There are of course applications where it’s important that the database scale. When it is, those requirements tend to be identified quite early on in the project. In many cases it’s not a requirement.

      By just seeing the database as just a data store you don’t let the data model affect how you design the business domain. In my experience it’s much harder to solve problems in the business logic then in the data layer. And with great patterns like CQRS and Repository pattern you can still design a good DB model when it’s time for that. But then again, sometimes the DB design is more important. The important thing is to reflect before choosing the same thing as one always do.

      • Jax

        In that context, I’m not sure I could agree more with your sentiments. Well put.