I’ve been working on implementing a data warehouse based on open source tools. The actual information system that produces the data runs on Microsoft Sql Server and other MS technologies. I have been asked to implement a minimum cost solution, zero if possible. It appears that some parts of the open source business intelligence domain are more mature than others, and zero cost might not be possible after all. KETTLE project from pentaho looks good enough for ETL. I’m using Mondrian (again from Pentaho) as an OLAP server, and even though it does not support complete mdx implementation, it looks promising for the moment. However, the front end is not promising. JRubik is no longer developed, and JPivot is in need of a face lift. Recent trends in web development raised the bar for UI requirements, and without even talking to the clients I can see that current solutions won’t cut it.
The overall process has been a very good experience, and it is still work in progress. I have high hopes for machine learning applied to output of the system, but that’ll be another post. The thing is: this project is not in healthcare domain, and I’ve been thinking about how things would go if it was actually in healhtcare.
Looking at the work I’ve done so far, and the trends in healthcare, I can see a big problem emerging. Traditional persistence methods based on relational databased have a set of mature tools for interfacing them with other domains like business intelligence, analysis etc. Emerging trends in healthcare seems to go towards new implementations for persistence of data, and these trends do not usually employ traditional relational database design even if they use databases for persistence. So we might be going towards a situation in which we have unusual persistence implementations, and connecting these implementations to analysis tools will be hard. ETL tools etc. focus on well known practices, and assume that you’ll be dealing with some tables with foreign keys, relations etc. A recent discussion in OpenEHR mail list gave me a very different impression for reference implementations of OpenEHR. So we’d better start thinking about custom ETL tools or interfacing mechanisms to connect the upcoming implementations to information processing tools. I suspect that EHR related standards might have a hard time proving their value in providing valuable information if they can not be connected to existing knowledge engineering tools.
This gives me the idea of a healthcare ETL (extract transform and load) tool / initiative. If the managers can’t see the same set of knowledge engineering opportunities that exist for traditional db apps for EHR implementations, current initiatives might face a lot of resistance and struggle.