Business Logic != Database Logic

Databases are good for one thing: storing data. But so are spreadsheets, PDF files, and the like. What separates out a good database from other data storage facilities is the ability to retrieve and store records in the database efficiently.

There are a lot of table models that seem to be more worthy of a spreadsheet than a database. For example, consider this scaled down employees table (my apologies for the dots, but it makes the formatting work):

+----+------+------------+
| id | name | department |
+----+------+------------+
| 1. | Bob. | Sales..... |
| 2. | Mark | Shipping.. |
+----+------+------------+

So, once this table is filled in with all sorts of grand information about the employees we have a database to work with. We can insert employees, and delete them, and update their information. Each of these operations mirrors a database operation of the same name.

But, when we do that we lose historical data. If Bob quits all of a sudden, and he’s deleted from the database, then from that point one it’s like he never existed!

So we work around that. We make some kind of currently_employed boolean column to denote which employees are active and which are not. We also probably add a date of hire and a date of termination.

Thus, we went from database CRUD to CRU.

What happens, if instead of Bob quitting, that he gets transferred over to Mark in the shipping department?

Someone will have to go in and update the table to reflect that Bob’s department is now Shipping. Easily done. However, after that point there’s no history to show that Bob was ever in sales. Maybe his hire_date column gets changed instead, which then makes it look like he’s only work for the company a few days instead of all of those years he put in at the sales department.

Or, we could model the data structure so that it’s more flowing. Get rid of the updates and deletes, and stick with inserts and selects only.

Consider this new data model:

+----+-----------+------+------------+------------+
| id | person_id | name | department | hire_date. |
+----+-----------+------+------------+------------+
| 1. | 1........ | Bob. | Sales..... | 2002.01.01 |
| 2. | 2........ | Mark | Shipping.. | 2002.02.02 |
+----+-----------+------+------------+------------+

and see what happens now when Bob moves to Shipping:

+----+-----------+------+------------+------------+
| id | person_id | name | department | hire_date. |
+----+-----------+------+------------+------------+
| 1. | 1........ | Bob. | Sales..... | 2002.01.01 |
| 2. | 2........ | Mark | Shipping.. | 2002.02.02 |
| 3. | 1........ | NULL | Shipping.. | 2005.11.11 |
+----+-----------+------+------------+------------+

Every time something new happens we do an insert instead of an update. This keeps all of the historical data intact.

Now, all we have to do is update our software to understand this logic. When we go to query employee information, what we want to happen is that it merges all of the data together down the database based on person_id. NULL values are simply ignored.

I have implemented this scheme in a number of non-Rails applications here. The most visible place where this is handy for us is device calibrations. Instead of just loading in new values into the database for new device calibrations, we insert new rows into the database and let the logic pick out the latest and greatest calibration. But it allows us to go back after the fact and see the calibration history for each of the items, when it was calibrated, where, and by whom.

I use a Rails site to access some of this information, but it’s not seamless to make it work. I had to add some views to the database, and put some custom code in both the model and controllers. My thought here is that I could achieve this seamlessly by overriding some of the ActiveRecord::Base methods. That is my next step, time permitting.

Surely, though, I didn’t invent a new model or pattern did I? I can’t be the only one who wants to store their data in this fashion.

Well put!

Although the subject/title had me expecting something else. That's a good link to Martin Fowler the other comment provided which mentions the option of letting programs glean historical information and relationships from the audit log rather than promoting the temporal data to the most frequented tables. But anyway, you have described this in a nice and original way. Thanks.