Things we forget about the database

Like many of you out there, I’ve been involved in distributed, multi layer software development for some time. Starting with the first ever asp based web site I’ve developed in 96 or 97, I’ve built a lot of multilayer applications.

In the process, I’ve gotten used to switching to new tools and approaches as long as they were tested and approved by the industry, and the developers I trust and respect. As a result, I’ve developed habits, I’ve become used to reaching for my trusted tools in many of the projects.  I’ve made some friends like llblgen and hibernate for db access, and they’ve been with me for a long time now. However, a group of friends of mine, did not switch to these widely accepted solutions, and I’ve been giving them a lot of critisim for not doing so. Especially Ufuk, an old friend of mine, who has been developing his product for over 10 years now, has been a proponent of db level optimization. Where I defended the idea of portability of databases, he defended performance, and choosing a platform for db and sticking with it.

In the past years, he has proven that my argument did not hold so well. In all the projects I’ve taken part in the last 8 years or so, how many times do you think the db layer changed? I can only think of one case, and even that one did not go beyond a proof of concept. I can’t help remember being forced to resorting to stored procedures and db level code when my team was challenged with sorting and printing out financial records of almost 10 companies from a central db, that spans a year’s transactions. Would you like to do that using hibernate? Good luck with that.

While I was investing in the future configuration and product changes in future, Ufuk kept optimizing his product, and investing into stored procedures in db, in his chosen db layer product: MS Sql Server.  I’ve seen his software perform exceptionally fast on quite moderate hardware. So for the last two years I’ve started to pay attention to the things I’ve seen in the past. The things done by experienced DBAs working on Oracle or MS Sql Server, the things done by developers who avoid ORM, and focus on squeezing out every bit of performance from their DB layer. Looking back at the last five years or so, I have to admit that I’ve been wrong in some of my choices, and I intend to not to make the same mistakes again. Especially for the persistence layer I’m testing for openEHR reference implementation extensions I’m developing.

I’ve chosen an open source db, postgresql, and I’m still using hibernate, but this time, I’ll put some effort into stored procedures (functions in postgresql speak), and make sure that I am using temporary tables, sets covering custom types etc. After all, there is no doubt that healthcare data is going to be large in size, so it is time to give the respect to the db layer, which has been the most mature layer in today’s solutions. DB products has been evolving for the last 20 years or more, and we have been neglecting them, focusing on throwing more hardware to the middle layer(s). This seems to be the classic hammer and nail symptom, where a man with only a hammer sees all problems as nails.  I’ll put some serious effort into exploiting postgresql, and this time, I’ll even drop the assumption that people would like to use mysql. If they do want to use MySql indeed, the whole thing is open source, so they can port my db layer code. If they can’t do it just because MySql does not have some feature, than I won’t sacrifice performance by establishing a lower common denominator,  just to make an uncertain future porting operation easier. As long as I’m using a trusted open source DB solution, this should be acceptable, since all the code is open source, and that’s an opportunity for others to change it into their favorite architecture or configuration.

Let’s see how it goes, I have a couple of plans for generating stored procedures automatically from archtype wrappers I’ve written, and if it works, it may be really fast. I’ll write down the results.