Tired of looking for errors in log files? Use OneTrueError - Automated exception management in .NET.

Griffin.Decoupled: The queries

This article will show you how you can build your queries with the help of Griffin.Decoupled. Griffin.Decoupled is a small framework which helps you build applications with loose coupling which also can be scaled easily.

Introduction

Queries vs Repositories.

Before we get into the queries I’ll try to show you why queries are better than repositories (or using an OR/M directly). Be warned: You will have to write more code, but in turn get more structured code which is easier to work with. You’ll also be able to do some pretty cool stuff that I’ll demonstrate in the next blog post.

With repositories I do mean the repository pattern which intended purpose is to abstract away the persistence layer. Do note that anything that exposes LINQ/IQueryable is not an abstraction. Why? Go ahead and try to implement IQueryable and try to cover all test cases for it. It’s not very fun. In other words, a repository looking like this..

public interface IRepository<T>
{
	Save(T entity);
	Delete(T Entity);
	T Get(int id);
	
	IQueryable<T> Query { get; }
}

.. is poorly architectured for two reasons:

  1. The query isn’t executed until the query is being enumerated (and have therefore left the repository class) = leaky abstraction
  2. IQueryable is a GOD interface (i.e. violating Interface Segregation Principle) = hard to implement properly

A must better abstraction:

public interface IRepository<T>
{
	Save(T entity);
	Delete(T Entity);
	T Get(int id);
	T GetByUserName(string userName);
	
	IEnumerable<T> GetAllByLastName(string partialName);
}

It’s now clear what we can query by and a lot easier to mock/fake that repository.

If you were observant I’ve split the repository into two parts. That’s since we got two responsibilities for our repositories. The first one is to CRUD (Create Update Delete) our entities.

Save(T entity);
Delete(T Entity);
T Get(int id);

The other responsibility is to find the entity/entities that we want to modify (“hello QUERIES”):

IEnumerable<T> GetAllByLastName(string partialName);

We could also say that the difference is that the former responsibility is used when taking an action (i.e. doing something inside a command) while the other one is when the user want to identify entities. That observation also means that we can cache cache and cache the queries, while we always want to work with the most recent copy in the commands.

Since we got two distinct responsibilities we should really structure our code after that. The repository pattern is great at allowing us to abstract away the data source, but it really doesn’t fit when writing decoupled applications since it mixes those two responsibilities (and therefore break the Single Responsibility Principle). For searching/browsing we’ve already come up with something and that’s the queries. We’ll take a deep dive into them in this article. The CRUD operations should also get an abstraction so that we can unit test our commands without integration tests.

I like to call that abstraction IDataStore<T>.

IDataStore

It’s nothing fancy. It’s not complex. It isn’t fat. It’s IDataStore<T>!

public interface IDataStore<TEntity, TKey>
{
	T Get(TKey key);
	void Save(TEntity entity);
	void Delete(TEntity entity);
}

I do not recommend that you use that interface directly, but instead define a new one for each root aggregate, for instance like this:

public interface IUserStorage : IDataStore<User,int>
{
	User GetByUserName(string userName);
}

Observe that there is not a single LINQ or Lambda expression in that interface. It’s not likely that you’ll need one either, since the interface is used to find a previously identified entity (remember, the interface it typically only used within the commands).

Here is an example:

[Component] // <-- means that the class should automatically be registered in Griffin.Container
public class RelocateUserHandler : IHandleCommand<RelocateUser>
{
	IUserStorage _userStorage;
	
	public RelocateUser(IUserStorage userStorage)
	{
		_userStorage = userStorage;
	}
	
	public void Handle(RelocateUser command)
	{
		// see? We already know which user to modify
		var user = _userStorage.GetByUserName(command.Username);
		
		user.Move(command.NewAddress);
		
		_userStorage.Save(user);
	}
}

The ravendb nuget package contains a base class for IDataStore implementations (feel free to contribute with packages for nhibernate, EF etc) that allows us to just write this:

public class UserStore : RavenDataStore<User>, IUserStore
{
	public User GetByUserName(string userName)
	{
		return Session.Query<User>().FirstOrDefault(x => x.UserName == userName);
	}
}

The data store interface gives us a simple but yet efficient way of abstracting away the data access from our commands.

If your data layer already got a good abstraction there is really no need to use this interface.

The queries

Queries are only used for viewing and searching. Hence it doesn’t really matter (in most systems) if the viewed entity isn’t a fully updated copy of your object (read: cached).

Be careful when reusing queries, don’t try to make them more generic or adapt them to fit several use cases. It will make future changes hard (since it doesn’t got a single reason for change anymore).

As with the commands the queries is divided into two classes: One that defines the query (IQuery) and one that executes the query IExecuteQuery). The separation is made to be able to decouple the query request from the actual implementation.

I struggled a lot to make this possible without letting the generic type definitions spread throughout the code. Big thanks to dotNetJunkie for his article about queries. My solution is heavily inspired by his code.

Choosing the right tool

I recommend that you stop and think about which approach you should use when writing the queries.

imho using an OR/M for all queries is probably not very bright (YES I’M TALKING TO YOU! ;)). Remember, queries are used for viewing only.

Here are some of the solutions that you can use in the queries:

Views

Views isn’t faster than a regular SQL query. They are however still great since they simplify the queries (in your code) and therefore make your code less error prone (since the joins etc are made directly in your DB = “compiled”).

Instead of having something like:

public class GetUserExecutor : IHandleQuery<GetUser, User>
{
	IDbConnection _connection;
	
	public GetUser(IDbConnection connection)
	{
		_connection = connection
	}
	
	public User Execute(GetUser query)
	{
		using (var command = _connection.CreateCommand())
		{
			commmand.CommandText = @"SELECT Users.*, Departments.Title, Addresses.City
										FROM Users, Departments, Addresses
										WHERE Users.DepartmentId = Departments.Id
											AND Users.PostalId = Addresses.Id
											AND Users.Id = @userId";
			
			// etc
		}
	}
}

You could just have:

public class GetUserExecutor : IHandleQuery<GetUser, User>
{
	IDbConnection _connection;
	
	public GetUser(IDbConnection connection)
	{
		_connection = connection
	}
	
	public User Execute(GetUser query)
	{
		using (var command = _connection.CreateCommand())
		{
			commmand.CommandText = @"SELECT * FROM GetUser WHERE User.Id = @userId";
			// etc
		}
	}
}

You can create indexed/materialized views to gain speed. They are however database engine specific (= not specified in the SQL92 standard).

Stored procedures

imho stored procedures isn’t an option for queries. The performance gain over parameterized queries isn’t that big anymore.

disclaimer: I’m not very good at SPs but read Efficency of stored procedures vs raw queries or To SP or not to SP in SQL Server or Speed Test: Dynamic SQL vs Stored Procedures. Feel free to educate me / the readers by leaving a comment if I’m wrong.

Lightweight OR/M (i.e. a DataMapper)

This is indeed my favorite option together with views.

Let’s take PetaPoco as an example. It uses T4 to generate all POCOs for you. You’ll probably not want to use them together but map them to your domain models (with something like AutoMapper or manually). PetaPoco doesn’t try to be a complete abstraction, but tries to make you things easier for you in the data layer. You still write SQL queries although you get POCOs back.

Example:

var db=new PetaPoco.Database("connectionStringName");
foreach (var a in db.Query<Article>("SELECT * FROM articles"))
{
    Console.WriteLine("{0} - {1}", a.Id, a.Title);
}

Feel free leaving a comment with your favorite data mapper and the reason to why you use it.

OR/M

I’ve been using OR/Ms a lot. Mostly nhibernate but also Entity Framework. I’ve also written two OR/Ms myself (which worked great).

But imho OR/Ms fool you. They let you get started quickly, but as soon as you try to go off the main stream path and do something more complex they’ll bite you. I bet that you could have written a SQL query + the mapping in less time than it took you to get that complex LINQ statement working with your OR/M.

Vanilla ADO.NET

Don’t frown. Remember. Queries do only contain pretty simple select statements (unless you are using paging in SQL Server). It should be quite easy to write the SQL statements and the mapping from a IDataReader to your pocos.

Small example:

public class GetRegisteredButNotLoggedInExecuter : IExecuteQuery<GetRegisteredButNotLoggedIn, NotLoggedInUser[]>
{
	IDbConnection _connection;
	
	public GetRegisteredButNotLoggedInExecuter(IDbConnection connection)
	{
		_connection = connection;
	}
	
	public NotLoggedInUser[] Execute(GetRegisteredButNotLoggedIn query)
	{
		using (var command = _connection.CreateCommand())
		{
			command.CommandText = @"SELECT Id, FirstName, LastName, RegisteredAt
									FROM Users 
									WHERE RegisteredAt = LoggedInAt";
			
			var reader = command.ExecuteReader();
			List<NotLoggedInUser> users = new List<NotLoggedInUser>();
			while (reader.Read())
			{
				users.Add(Map(reader));
			}
			
			return users.ToArray();
		}
	}
	
	//IDataReader implements the interface IDataRecord.
	private IEnumerable<NotLoggedInUser> Map(IDataRecord record)
	{
		var item = new NotLoggedInUser();
		item.DisplayName = reader["FirstName"] + " " + reader["LastName"];
		item.Id = reader["Id"].ToString();
		item.RegisteredAt = (DateTime)reader["RegisteredAt"];
		yield return item;
	}
}

Important! You should always use parameterized queries when using ADO.NET since they are not vulnerable to SQL injection. Instead of:

using (var command = _connection.CreateCommand())
{
	command.CommandText = @"SELECT Id, FirstName, LastName, RegisteredAt
							FROM Users 
							WHERE RegisteredAt = " + myVariable; //<-- SQL injection is possible
	
	var reader = command.ExecuteReader();
	List<NotLoggedInUser> users = new List<NotLoggedInUser>();
	while (reader.Read())
	{
		users.Add(Map(reader));
	}
	
	return users.ToArray();
}

you should be using:

using (var command = _connection.CreateCommand())
{
	command.CommandText = @"SELECT Id, FirstName, LastName, RegisteredAt
							FROM Users 
							WHERE RegisteredAt = @theVar"; //<-- SQL injection safe
	
	command.AddParameter("theVar", myVariable); //<-- parameter
	
	var reader = command.ExecuteReader();
	List<NotLoggedInUser> users = new List<NotLoggedInUser>();
	while (reader.Read())
	{
		users.Add(Map(reader));
	}
	
	return users.ToArray();
}

Schemaless DB for the read model

You can use the domain events to generate a denormalized read model which you can then store in a NoSQL db. It will make the reads blazingly fast.

Summary

I’m not saying that using a OR/M is a bad choice. I’m just saying that it isn’t always the best choice. You could for instance switch to vanilla ADO.NET + a view for a query which you know will problematic with your OR/M.

Creating the queries

Let’s create a query. We want to fetch all users which has registered but not logged in a second time. We start by defining the class that represents the query (but isn’t going to execute the query). These classes can later be used as cache keys (if you implement GetHashCode()).

public class GetRegisteredButNotLoggedIn : IQuery<NotLoggedInUser[]>
{
}

Notice that it got no parameters since we don’t really need any. If you do use parameters, specify all required ones as constructor arguments and optional as get/set properties. It makes it easier to tell what we have to specify. The query will return a custom class (as specified in the interface definition):

public class NotLoggedInUser
{
	public string Id {get;set;}
	public string DisplayName {get;set;}
	public DateTime RegisteredAt {get;set;}
}

We also need to create a class that will be executing the query. In this example we’ll use the awesome NoSQL database RavenDB.

public class GetRegisteredButNotLoggedInExecuter : IExecuteQuery<GetRegisteredButNotLoggedIn, NotLoggedInUser[]>
{
	IDocumentSession _session;
	
	public GetRegisteredButNotLoggedInExecuter(IDocumentSession session)
	{
		_session = session;
	}
	
	public NotLoggedInUser[] Execute(GetRegisteredButNotLoggedIn query)
	{
		return _session.Query<User>()
			.Where(x => x.LoggedInAt == x.RegisteredAt)
			.Select(x => new NotLoggedIn{
				Id = x.Id,
				DisplayName = x.FirstName + " " + x.LastName,
				RegisteredAt = x.RegisteredAt
			})
			.ToArray();
	}
}

Paging

With a query like the one above we usually need paging. Let’s add support for that by letting the query class inherit a base class in Griffin.Decoupled:

public class GetRegisteredButNotLoggedIn : BasicConditions<User, IQueryResult<NotLoggedInUser>>
{
}

The first type definition tells which root aggregate we’ll use when querying the data source. It allows us to get type safety when using the Sort methods (see below).

Also notice that we changed the return type to IQueryResult. It’s a small interface that also defines the total amount of rows that matches the query (so that we know how many pages we’ll get).

var query = new GetRegisteredButNotLoggedIn()
	.Page(2, 50); // second page, 50 items per page

Sorting

The base class do also support sorting. It can either be typed using an expression:

var query = new GetRegisteredButNotLoggedIn()
	.Page(2, 50)
	.SortBy(x => x.LastName); // sorted

But when we use the queries through a web or similar it’s probably just a string. No problem:

public ActionResult GetUsers(string orderBy = "LastName")
{
	var query = new GetRegisteredButNotLoggedIn()
		.Page(2, 50) // second page, 50 items per page
		.SortBy(orderBy);
		
	return Json(_queryDispatcher.Execute(query));
}

The SortBy() method will throw an exception if the property name can’t be found in the User class.

Applying Paging/Sorting

There is also a helper which will apply the conditions to an IQueryable:

public class GetRegisteredButNotLoggedInExecuter : IExecuteQuery<GetRegisteredButNotLoggedIn, IQueryResult<NotLoggedInUser>>
{
	IDocumentSession _session;
	
	public GetRegisteredButNotLoggedInExecuter(IDocumentSession session)
	{
		_session = session;
	}
	
	public NotLoggedInUser[] Execute(GetRegisteredButNotLoggedIn query)
	{
		// define the query
		var myQuery = _session.Query<User>()
			.Where(x => x.LoggedInAt == x.RegisteredAt);

		// count all items
		var count = myQuery.Count();
		
		// make it sorted and paged = using the helper
		myQuery = query.ApplyTo(myQuery);
		
		// and transform to the query model
		var items = myQuery.Select(x => new NotLoggedIn{
				Id = x.Id,
				DisplayName = x.FirstName + " " + x.LastName,
				RegisteredAt = x.RegisteredAt
			}).ToArray();
		
	
		return new QueryResult<NotLoggedInUser[]>(count, items);
	}
}

Executing the query

We have to be able to execute the queries too. Queries is located with the help of the IQueryDispatcher class. Example:

public class MyApiController : ApiController
{
	IQueryDispatcher _queryDispatcher;
	
	public MyApiController(IQueryDispatcher queryDispatcher) 
	{
		_queryDispatcher = queryDispatcher;
	}
	
	public IQueryResult<NotLoggedInUser> NotLoggedIn(int pageNumber = 0, int pageSize = 50)
	{
		var query = new GetRegisteredButNotLoggedIn();
		query.Page(pageNumber, pageSize);
		
		return _queryDispatcher.Execute(query);
	}
}

The typical approach is to use the inversion of control container to locate all query executors. There is no facade (like CommandDispatcher.Execute()) available for the queries, since they should be invoked synchronously and in an existing IoC scope.

So you’ll have to register the dispatcher in your container, something like this:

    yourContainer.RegisterService<IQueryDispatcher>(
        x => new IocQueryDispatcher(x), 
        Lifetime.Scoped
    );

Summary

This was a brief introduction to the query handling. I intentionally tried to provoke you by saying that the OR/M isn’t the best choice. My point is that the OR/M might not be the fastest or best way to implement the queries. At least think before you choose it.

Next up will be a complete demo application (a small quick note web application). In it I’ll demonstrate all the features of the library. These abstractions also have a nice side effect: It’s very easy to find everything that can be done in the application (just search for all classes that implement the query and command interfaces). Hence I’m going to show you how you with very little code can execute all queries and commands directly from the client side.

For now you can find a query example at github.

This entry was posted in Architecture and tagged , , . Bookmark the permalink.
  • http://twitter.com/IsIt_David IsIt David

    Hi there, i’ve been reading through your articles today. thanks! they’re very good. Any news on the small note taking web app (hopefully mvc!)? i’d love to see how all this comes together.

    • JonasGauffin

      I decided to do it in WinForms first. There’s a lot less code involved (so it’s easier to follow the framework). I’m currently writing the socket client/server which will be used to invoke messages/queries at server side.

      • http://twitter.com/IsIt_David IsIt David

        OK. Thanks!

  • Khh

    Wonderfull articles and a nice compact framework.

    How would the registration of the IocQueryDispatcher should be done with the Griffin.Container framework elements?