Archive for October, 2013

I attended another Data Science London meeting last week. As usual, it was a good one. Speakers talked about their experience with twitter feeds that includes foursquare check-ins and scraping data from web sites. Scraping is basically extracting information from the web pages, in a way simulating a human’s use of the web site to use the information provided by that web site.

Both talks were interesting, and both had something in common: the people who are trying to access data had no programmatic, well defined method of doing so, so they resorted to other methods. The case of Lyst was especially interesting. They’ve gone through a lot of trouble to set up a system that can collect data from lots and lots of online fashion retailers. They have an infrastructure that extracts information about tens (hundreds?) of thousands of products from lots of web sites, and as surprising as it may be, they are actually keeping things under control, and presenting a single site that allows people to access data as if it is presented by a single source.  A question that was asked by someone in the audience was: “do you have any programmatic access to these sites?”. As in, do they give you web services? The answer was something in the lines of very few. It is usually a crawler that extracts information from the web site that does the job (though they are working with the consent of the web sites they’re parsing). I think it was also someone from Lyst  (or maybe the audience, not so sure about it) who said it is pretty much the reality of the web we have today, despite all that hype about semantic web.

Let me be honest: I never thought semantic web would take over. Whenever I saw someone giving a talk about the future of the web, about how web sites would be talking to each other, about how RDF and OWL would let the web become a computable gigantic knowledge base, I thought: “sorry, you’ll never get there”. That is because I’ve spend a serious amount of time developing web applications. I started around 97, and did it seriously until 2004 or so. I got to learn the way things work, got to see the way trends are going, and after 2004 I still did a lot of development using web technologies though I did not really have to deliver anything that needed to be production quality. When you see the business and technology side of the web, you begin to develop a sense of what would take off in this domain and what would not.

I was sure semantic web would never get to where it was expected to go, because (drumroll) it simply offered no immediate, tangible business value to owners of web sites. It is the good old semantic interoperability problem we have in healthcare. Someone must justify the extra effort for making data computable by others, or that effort will not be funded. It is an investment into a scenario which would become reality if everybody committed into it.   In healthcare, the need to integrate different sources of data has always been significant, so we had HL7, DICOM, openEHR, 13606 etc. For the web, the need is not really there. In fact, if you’re not selling an actual product, but you’re making money off information, it is better that you do not let anyone else process your information programmatically. It is your clicks, your ad revenue people will take away from you, so why let them do it?

On top of the business model related issues, add the technology problems. The UI on the web is becoming more and more interactive, javascript is ruling everything. Relational DBs and other well established back end technologies and architectures are being phased out and you’re expecting people to catch up with all these and support OWL or some other semantic web technology on top of it. Not going to happen.

RSS and web services may help islands of information to connect to each other, but they are mostly standards for the wire, that is, they help communicate information, they don’t offer content models, or support semantic interoperability. Just look at all those companies building businesses on top of extracting information from twitter.

I have put enormous effort into building a framework for better decision support in healthcare, and looking at where we are today, I think there is a future for computable health, so no need (yet) for me to cry for my lost years. The practice of medicine needs this. As hard as it may be, as long as the need is there, there will be effort to deliver. We should not be too relaxed though, because if the reward is good enough, someone will always build solutions on the interim approach, and if it is perceived as good enough, better may actually kill best.

There is a massive amount of tools, technologies and people out there, making the web work smarter even if the components of the web are not necessarily helping. I think healthcare will do better than web in terms of becoming computable, but it will get there much faster if we can offer an economic benefit to the system developers/builders, a reason to build computable health systems.


Read Full Post »

I am a big admirer of Eclipse. It is an incredibly ambitious piece of work. It tackles the problem of creating a platform for software tooling, a platform that can generalize features of most IDEs, report tools, scientific software and even regular desktop applications.

Not everybody agrees with me of course, when it comes to calling Eclipse an impressive piece of work. I won’t waste pages trying to convince those who disagree. Due to is generic infrastructure, Eclipse may not feel like a tool specific to java development, python development etc. Even if you get over the slightly unintuitive feeling it gives you,  it is hard to ignore the effort required to make Eclipse your home.

Home in the sense that your particular Eclipse installation supports Java, XML, Python, R, EMF or whatever you’re interested in using (Haskell? Sure, why not?) You configure it, you find the links to update sites, add them to your Eclipse config. Change workbench settings based on your preferences. Then someone else wants to work on your code, or you make a jump to another computer. Or you find yourself instead of a computer that is not your, but you need to use for an hour or so to demonstrate something to someone.

Being able to manage your Eclipse installation using the cloud help you in these cases. Imagine being able to share your IDE  with your friends,  colleagues, or simply with people who want to use your code through the exact same Eclipse setup you have, which is known to work for sure. Pulse was a product that enabled this for free. Was, because even though it is still available for a few more weeks, it is now being replaced with SDC Cloud Connect from Genuitec.

Genuitec is a company that understands how people use Eclipse, what kind of problems they have, and more important than that, how those problems can be solved. Pulse was my favourite tool because of this. I have a new computer at UCL? No problem. I install Pulse, pull the installations I want from my profile, and get to work. SDC Cloud Connect removes Pulse and uses an Eclipse plugin coupled with a clever web based interface to do the same. It is still free up until you hit a certain limit, in terms of Eclipse instances you’re hosting on the cloud. If you pay for it and go private, you can have a lot more: a custom server behind your firewall that lets you deploy your company’s version of Eclipse, and other nice things that people who pay for software get.

For me, Cloud Connect is a way of pushing my well polished configurations to colleagues and friends who keep saying “I can’t spend ages configuring Eclipse”. Well, I’ve spend all the time required, and here is a link for you, go get my statistics benchmark installation.  In the future, we may seriously consider this mechanism for distributing openEHR based tools, certainly beats explaining the plugin mechanism etc to first time Eclipse users.

So if you’re curious about the experience, go visit http://www.genuitec.com/sdc/cloud/ and play with the technology.

Read Full Post »