Friday, February 13, 2009

An Irrational Love of the Relational


Is a relational database the best data-store for serializing object-graphs?


Caveats
  1. This applies to coders who employ AgileDomain Driven Design
  2. The application you are building is not a data mining app
Imagine building an object-oriented application. You find that you need to save and subsequently retrieve an object's state (serialize and de-serialize your object graph). Is a relational database (RDB) the best data store for you?

The answer seems pretty clear to me - Relational databases are just about the worst kind of data store for storing an hierarchical object-graph

Of course an RDB can store object state. Mapping-tables, foreign keys and normalisation can all be used to project your natural object hierarchy into tables, rows and columns in the way that a globe can be projected onto the flat page of an atlas. 

But the map is not the terrain, the RDB projection is nothing less than a distortion of the object graph. Translating between the object-graph and its distorted, relational representation requires work - and not just computer cycles, although it needs plenty of those, but also the hand-rolled  code / bugs required to manage the transformation of data to and from the RDB object-store.

Traditionally this code was encapsulated into a bespoke Data Access Layer (DAL). It makes me shudder to remember just how much of my life I have wasted writing and debugging DAL code. But all that wasted time is not the critical problem, the real kicker is that a DAL severely limits how Agile you can be. 

Take a typical agile process, the quick refactoring of a class definition, say the addition of a new public property.
  • Add new property to class
  • Add equivalent mapping to DAL class
  • Add equivalent parameter to SPROC
  • Add equivalent column to RDB table
Notice how that list feels back to front?  Surely life would easier if I did things the other way around?
  • Add new column to RDB table
  • Add equivalent parameter to SPROC
  • Add equivalent mapping to DAL class
  • Add equivalent property to class
This shows up another problem with using RDBs - It promotes data-driven design
To me the natural way to design a domain is to play with the objects but an RDB + DAL approach flips this natural flow on its head and makes you design backwards. It makes you design the data-store before the domain, from tables -> objects.

Data driven design means that you do all the work up-front (table, sproc, DAL then finally object) which greatly increases the cost of experimentation. This ossifies the design process. Data-driven design strongly inhibits a successful design evolving out a series of cheap experiments. This is why agile coders tend to use domain driven design.

So why suffer all this RDB pain? Why not use a data store whose intrinsic architecture fits the structure of an object graph and does not require piles of buggy DAL code just to satisfy the basics of object serialization? DBAs often cite two main reasons:
  1. You can run reports that cut across the object graph
  2. You can keep you data application agnostic allowing future applications to use the data
Reason 1 - Because I am not writing a data-mining app I know my report designs in advance. Therefore I don't need ad-hoc, dynamic reports. Since my reports are pre-defined they can be represented in my object-graph as a collection of serializable report objects. Reports are just filtered collections of report objects. 

Reason 2 - I adhere to strict YAGNI (you aren't going to need it) principles therefore I only write code to do the job. No matter how tempting it is, I do not write code just in case it might be needed in the future

So what can the modern object-oriented coder do to make life a bit easier?

Use an Object Database
You can use a database that has been specifically designed to store object state data. They are called object databases.  I have played around with the open source object database called DB4Objects which I found to be very fast and easy to use (as easy as NHibernate) but when I was playing its development was in a state of rapid flux. If you have used an object database then please leave a comment.

Scrap the DAL
Object Relational Mappers (ORM) e.g NHibernate allow you to scrap the DAL.  ORMs automate, as far as possible, the cruddy DAL code and get you closer to the agile ideal of fast and cheap refactoring. You can create new classes, add and remove and interface elements and the ORM will take care of adding new tables and columns. You need never write another object persistence save / update / delete SPROC again (almost). 

Whilst much better than writing DAL code, ORMs are not perfect and are not pain free. Every once in a while you have to go to the DB and mess around. This might seem trivial but you will be amazed how quickly all that DB experience evaporates. 

Also, to ORM-enable your objects you need to somehow provide the mapping information that links the objects to their equivalent tables. These days this can be handled by marking up your classes with fairly simple [attribute] metadata but it is still a surprising amount of work to keep the attributes up to date.

So ORMs are not the final answer because they are really just papering over the the underlying problem: A relational database is not the best data store for serializing object-graphs.