What is the definition of “user friendly” for a doctor?

Ok, this is really an interesting one. In medical informatics, one challenge that never seems to be conquered is providing a user interface that will not make a doctor grumble.
No matter how hard you try, you almost always here the comment: “this is not so easy to use…”. Medical professionals seem to be very picky when it comes to user interfaces and interaction with information systems. This link here mentions the same thing again. I’ve previously written about Microsoft CUI, and for all of our sakes, it would better be successful. This is a field that is sucking up a huge amount of effort and it is a large setback to adoption of many systems. Being terrible about user interfaces, I am not the one to take things further in this domain, but this is one field which should benefit enourmously from some form of standardization.

Can you build a product for data mining?

I am not sure that the answer to this question is yes. Sure you can find a lot of vendors who’d claim that they already have a product that can miraculously convert raw data into precious information, but is this really the case? I doubt it

My own experience with real life data has shown me that generating information from data is a complex process. It almost always requires understanding of the domain, in which the data is generated, and moreover, the issues with data make the whole thing an ad-hoc process. I’ve asked the Weka mail list about commercial tools that can be considered as alternatives to Weka, and the responses so far confirm my feelings about the domain: you can’t simply throw in a product and expect people to get knowledge out of data. You’d need a substantial amount of knowledge about machine learning and data mining, how these tools and more important algorithms work. Most of the time, you’ll need to adjust many parameters, transform data, and iterate continously till you have a pipeline that connects raw data to some form of decision making process. Sure there are tons of ETL tools, Business intelligence tools etc. These are either infrastrucutre tools for transforming and/or moving data or glorified reporting tools.

The machine learning domain is a moving target with new methods introduced continously, and I can not think of a product that can integrate these methods in a way so that someone with no knowledge of machine learning or data mining can use it to create meaningful information.

I’ve always considered data mining and related decision support domain as a service based industry. Weka seems to be the most suitable tool for this domain, but I have my doubts about scalibility problems when using SVMs, or learning Bayesian Networks from data. I’d like to see how GPU based solutions like Cuda from Nvidia would perform in these cases, and more important than that, would it be possible to parallelize these algorithms for multicore cpus or gpus (maybe even for cell, the horsepower of PS3).

Microsoft Amalga, why did not anyone notice this?

I was thinking that Microsoft was going to follow a more infrastructure oriented approach like other vendors. Even when I read that they have acquired a healthcare informatics company that has its own hospital information system, I was expecting that system to be transformed into some form of Microsoft Healthcare Framework, but it appears MS is taking a bold step, and entering the hospital information system market directly.

People, this is big news! It is quite interesting that I do not see many news about this, but Microsoft is the only large vendor that I know of, who is providing a HIS directly to the market. So far, MS has been a technology supplier for many healthcare solutions, and they have invested heavily in healthcare. It appears they are using a different strategy here; instead of waiting for the ISVs to build a solution based on their tech, just give them a head start by giving a complete product, and let them work on it.  Not that the ISVs have not been able to do it, but appearantly they have decided to take things under control, and maybe increase the pace adoption of MS technologies in healthcare.

The thing is; this will kind of going to make a large set of functionality “common commodity”, which means it will be quite expensive and unnecassary to build a significant portion of HIS and related products for ISVs, unless they have a really good reason. I really wonder what is going to happen here. If MS keeps investing in this direction, this might give a lot of competitors a hard time, because efficient integration between front end products, and backend products has been the primary advantage of MS in many scenarios. If the same thing happens in healthcare, this might change some existing dynamics between technology vendors, ISVs and users in healthcare informatics.

I should add that I still believe that the feature of healthcare informatics will be in domain specific language oriented tools, highly specialized and supported by vendors. OHF, being supported by IBM, seems to be a step in this direction, and MS can shift gear when they introduce some kind of healthcare framework. I truly believe that at some point, HL7 or OpenEHR oriented development tools with emphasis on domain specific languages will emerge, and they will have a significant effect on healhtcare IT. I guess this is something that I should write about in a post on its own.

Open source business intelligence and OLAP for Healthcare

I’ve been working on implementing a data warehouse based on open source tools. The actual information system that produces the data runs on Microsoft Sql Server and other MS technologies. I have been asked to implement a minimum cost solution, zero if possible. It appears that some parts of the open source business intelligence domain are more mature than others, and zero cost might not be possible after all. KETTLE project from pentaho looks good enough for ETL. I’m using Mondrian (again from Pentaho) as an OLAP server, and even though it does not support complete mdx implementation, it looks promising for the moment. However, the front end is not promising. JRubik is no longer developed, and JPivot is in need of a face lift. Recent trends in web development raised the bar for UI requirements, and without even talking to the clients I can see that current solutions won’t cut it.

The overall process has been a very good experience, and it is still work in progress. I have high hopes for machine learning applied to output of the system, but that’ll be another post. The thing is: this project is not in healthcare domain, and I’ve been thinking about how things would go if it was actually in healhtcare.

Looking at the work I’ve done so far, and the trends in healthcare, I can see a big problem emerging. Traditional persistence methods based on relational databased have a set of mature tools for interfacing them with other domains like business intelligence, analysis etc. Emerging trends in healthcare seems to go towards new implementations for persistence of data, and these trends do not usually employ traditional relational database design even if they use databases for persistence. So we might be going towards a situation in which we have unusual persistence implementations, and connecting these implementations to analysis tools will be hard. ETL tools etc. focus on well known practices, and assume that you’ll be dealing with some tables with foreign keys, relations etc. A recent discussion in OpenEHR mail list gave me a very different impression for reference implementations of OpenEHR. So we’d better start thinking about custom ETL tools or interfacing mechanisms to connect the upcoming implementations to information processing tools. I suspect that EHR related standards might have a hard time proving their value in providing valuable information if they can not be connected to existing knowledge engineering tools.

This gives me the idea of a healthcare ETL (extract transform and load) tool / initiative. If the managers can’t see the same set of knowledge engineering opportunities that exist for traditional db apps for EHR implementations, current initiatives might face a lot of resistance and struggle.

Looking for trouble? Try Bayesian Artificial Intelligence…

Ok, I’ll be honest, I’ve always been into probabilistic methods, for they somehow “fit” into my way of thinking. There is something about probabilistic methods, and probability theory; you are either suitable to work with it or not; you either love the field, or hate it.

I’m the kind of guy who has some love hate relationship with it. I certainly like the field, but the overall concept is so deep and abstract that I can get lost very easily. Something that makes perfect sense seems like Chinese the next day, but I still can’t let go.

On top of that, I’ve been working on integrating probability based methods to my work in data mining and decision support, and finally I found myself working on Bayesian AI. Trust me; it “is” hard. It requires you to cover a vast amount of subjects and even then there is always something missing. Still, I have not given up, and I’m about to reach a point where I can build simple but practical applications for medical informatics. Bayesian AI is basically probabilistic modeling for building (semi)autonomous systems. After Judea Pearl wrote the book Probabilistic Reasoning in Intelligent Systems, an army of researchers rushed to the field, but still the field seems much less crowded compared to well known AI, neural networks etc. If you’d like to have an idea of what I’m talking about; MIT has a very good web page in OpenCourseWare which you can find here . I have been looking around to find some frameworks which I can use, and the work in the field has few complete, well polished outcomes. Most of the projects seems to be dead or incomplete, but there are a few worth mentioning, but I’ll do that later.

Bayesian AI provides a set of very strong tools when you have a  heap of raw data and chaos, which is an acceptable definition of health informatics. I’m very clear about one thing; I’m tired of building things that somehow collect and save data. Most of the time what we call information is nothing more than a set of fancy reports, and we are quite far away from using existing data for decision making. I really believe that there exists a requirement for a new generation of tools that will be based on modeling of healthcare domain so that we can forecast the outcomes of our choices, at least at a primitive level. Even the simplest of such tools would make a huge difference. I should say that it is very, very hard to build them, but it seems like a more justified effort than building another version of an already existing EHR system or HIS. There are a lot of bright people working on these fields, why very few people choose to work on modeling and forecasting is a mystery to me.