The semantic web that never was. Will it be the same for smart healthcare IT?

I attended another Data Science London meeting last week. As usual, it was a good one. Speakers talked about their experience with twitter feeds that includes foursquare check-ins and scraping data from web sites. Scraping is basically extracting information from the web pages, in a way simulating a human’s use of the web site to use the information provided by that web site.

Both talks were interesting, and both had something in common: the people who are trying to access data had no programmatic, well defined method of doing so, so they resorted to other methods. The case of Lyst was especially interesting. They’ve gone through a lot of trouble to set up a system that can collect data from lots and lots of online fashion retailers. They have an infrastructure that extracts information about tens (hundreds?) of thousands of products from lots of web sites, and as surprising as it may be, they are actually keeping things under control, and presenting a single site that allows people to access data as if it is presented by a single source.  A question that was asked by someone in the audience was: “do you have any programmatic access to these sites?”. As in, do they give you web services? The answer was something in the lines of very few. It is usually a crawler that extracts information from the web site that does the job (though they are working with the consent of the web sites they’re parsing). I think it was also someone from Lyst  (or maybe the audience, not so sure about it) who said it is pretty much the reality of the web we have today, despite all that hype about semantic web.Read More »

Pulse evolves into SDC Cloud Connect, and becomes even cooler

I am a big admirer of Eclipse. It is an incredibly ambitious piece of work. It tackles the problem of creating a platform for software tooling, a platform that can generalize features of most IDEs, report tools, scientific software and even regular desktop applications.

Not everybody agrees with me of course, when it comes to calling Eclipse an impressive piece of work. I won’t waste pages trying to convince those who disagree. Due to is generic infrastructure, Eclipse may not feel like a tool specific to java development, python development etc. Even if you get over the slightly unintuitive feeling it gives you,  it is hard to ignore the effort required to make Eclipse your home.

Home in the sense that your particular Eclipse installation supports Java, XML, Python, R, EMF or whatever you’re interested in using (Haskell? Sure, why not?) You configure it, you find the links to update sites, add them to your Eclipse config. Change workbench settings based on your preferences. Then someone else wants to work on your code, or you make a jump to another computer. Or you find yourself instead of a computer that is not your, but you need to use for an hour or so to demonstrate something to someone.

Being able to manage your Eclipse installation using the cloud help you in these cases. Imagine being able to share your IDE  with your friends,  colleagues, or simply with people who want to use your code through the exact same Eclipse setup you have, which is known to work for sure. Pulse was a product that enabled this for free. Was, because even though it is still available for a few more weeks, it is now being replaced with SDC Cloud Connect from Genuitec.

Genuitec is a company that understands how people use Eclipse, what kind of problems they have, and more important than that, how those problems can be solved. Pulse was my favourite tool because of this. I have a new computer at UCL? No problem. I install Pulse, pull the installations I want from my profile, and get to work. SDC Cloud Connect removes Pulse and uses an Eclipse plugin coupled with a clever web based interface to do the same. It is still free up until you hit a certain limit, in terms of Eclipse instances you’re hosting on the cloud. If you pay for it and go private, you can have a lot more: a custom server behind your firewall that lets you deploy your company’s version of Eclipse, and other nice things that people who pay for software get.

For me, Cloud Connect is a way of pushing my well polished configurations to colleagues and friends who keep saying “I can’t spend ages configuring Eclipse”. Well, I’ve spend all the time required, and here is a link for you, go get my statistics benchmark installation.  In the future, we may seriously consider this mechanism for distributing openEHR based tools, certainly beats explaining the plugin mechanism etc to first time Eclipse users.

So if you’re curious about the experience, go visit http://www.genuitec.com/sdc/cloud/ and play with the technology.