Small cloud vs big clouds

I’ve been thinking about a particular future business and its infrastructure for almost 3 years now. Cloud technologies are quite relevant to what I have in mind, and recently I’ve started to think about working on a set of open source cloud implementaitons.

However, there is a problem. It is not a technical problem, it is an economic one. Amazon, Google and probably MS will be in cloud business in a quite strong way in a couple of years. Amazon and Google do it already, and the cost efficiency of hiring their infrastructure vs creating mine is very, very relevant. Cloud loves hardware, in fact its advantage is at joining rather simpler models of storage and processing with very efficient scaling. So the advantage is about scalibility, and that is dependent on hardware. Now, knowing how to design and build a particular solution to a processing intentsive problem is a valuable asset, but what if your customers do not have the money to buy the hardware that can give the performance you’d like to provide? In these kind of situations hiring cloud capacity from these giants and putting your know how on top of if is more efficient in terms of cost, and many cases will make your services more affordable.

I can see a rough segmentation of market, where in some segments clients prefer rented cloud infrastructure simply for cost benefits, and in others they choose to buy their own farm, either because their data is very sensitive or simply because they have the money.

The other problem is, serious open source cloud technology comes from the giants, like Google who has given us the bigtable, and Yahoo who has improved and tested Hadoop. These companies have the ability to develop these kind of solutions since they need it, and they have to infrastructure and use cases to test it. Without these real life connections, how can a disconnected open source initiative develop alternatives to commercial offerings?

In short, cloud belongs to big names for the moment, and it is alive as open source because they want to keep it alive.  What if this changes in the future? I guess anyone in the field will have to know about these services quite well, just to make sure they can offer it as a lower cost alternative, at least for some cases.

Why on earth we don’t have open source proper terminology servers?

The competition amont different information models in healthcare will never end. Yes, I know that there are many out there who think that a particular piece of work is so much better than the rest, and it is the feature of healthcare informatics. Sorry, I don’t agree. There are many other reasons, which I’d like to outline in another post, but in general, I can’t see this competition going away in the future.

What is interesting is, use of terminologies is common in many information model standards, whether it be HL7, EN 13606 or openEHR. There are many open source tools for many aspects of healthcare informatics, but when it comes to terminology management, the choices are surprisingly few! Other than NCI’s LexGrid initiative and Apelon, I can’t see any serious terminology server work in the open source domain. These two have their own pros and cons, but in general, this sub domain is surprisingly deserted. Please know that I’m not considering projects which were updated 3 years ago for the last time as candidates for my work in Opereffa.

There is huge work around the concepts which will eventually get linked to terminologies, but there is not much effort in the terminology server area. Yes there are many browsers out there, but whatever you do in the modelling phase, you’ll have to have access to a proper terminology server during use of that model (be it a Snomed CT subset or an HL7 message with Snomed CT codes in it). So why I can’t see an interest in this? Is it because people are so focused at well known problems, that they do not bother to think about what lies beyond them? Did open source healthcare attack the problem of informatin model based solutions first, omitting terminology based solutions? Terminology based approaches are old, and they are well established, so I can’t explain the lack of open source decent projects in this field. If you know one, drop me a line, and I’ll buy you a beer/wine/{insert your favorite drink here}.

Eclipse vs Intellij Idea

Once I used to love Intellij Idea. At the time it was the most insightful java development environment. It was clearly created by people who knew about Java development. Then I started working with Eclipse, and at first, the switch was hard. At the time, Idea had so many nice features that I was used to, and I was quite frustrated to see that many of them did not exist in Eclipse.

In time I got used to Eclipse, and it introduced a huge amount of features. Yesterday, I had to download Idea to take a look at Apelon’s source code, and I’ve found it to be a really nice environment after all this time. (We are talking about 3, 4 years). However, Eclipse has come a long way, and I found out that I was now looking for things that I am used to in Eclipse. Idea provides a GUI designer, and this is one major problem where Eclipse is not providing any free solutions, but again, Idea is not a free IDE either.  The community and effort around Eclipse has grown so large that it is now able to compete with high quality commercial solutions, but seeing Idea still reminds me of the comfort of a polished IDE created only for software development. Eclipse is a platform, though most people do not need to go beyond its use as an IDE.

To be honest, if Eclipse can somehow deal with the SWT/Swing problems, it will probably be the king of the tooling and development arena. Of course till that time, we’ll all have to suffer.

A big fat thank you to Yavuz!!

After pulling my hair off for hours, I was rescued. Yavuz found a way to start Tomcat automatically on Jaunty. The problem turned out to be about the default use of dash, instead of bash. Following the well known setup one is supposed to do

sudo ln -sf /bin/bash /bin/sh

To get it working. Spent hours for this!!!

Reject dirty data! Don’t let it in, no matter what happens

More of a note to myself. Just working on the JSF bindings of the soon to be announced openEHR framework, and due to nature of my persistence model, once bad data finds its way into db, it messes the whole form entry. It is possible to modify the persistence mechanism for immunity to bad data, but is that something I should do? I guess not.

When bad data appears in the system, it should stop the execution. You (I) should find why that happens, and fix it. I’ll keep the persistence a little bit fragile intentionally. Better to discover potential problems now, then trying to find them in production system.

How much can you scale incremental approaches?

I can still remember a seminar I’ve been to during my master’s studies. It was about project management, and the speaker was a very experienced project manager from defence industry. When you are building a submarine, or a fighter jet, you don’t get to do many iterations! We are quite used to engineering projects in defence industry that takes billions of dollars, and years to complete.

When you speak to engineers, project managers and other people in these projects, you see that they all have the same approach: “of course this is the way to do it!”. How else can you run something as big as this? In theory, as time and billions of dollars go by, these engineering practices should have become perfect, with all the lessons learned.

There are some signs however, that I can’t ignore. Consider the circumstances and technology when NASA managed to send a man to the moon. 50 years after that ( almost), and NASA has just announced that another launch to moon would be delayed to 2020. The innovation in space shuttles now comes from small private companies, which do not have the billions to spend. Look at NHS, look at other gigantic engineering projects. There is something wrong here, but people are perfect in ignoring the pink elephant in the middle of the room.

So I’m thinking, we should find a better way of ensuring agility, to the max we can do, we need to change a couple of things, even for larger projects. I’m not a core project manager, though I’ve done my share of it, but I’d like see some people with experience of both formal, and less formal and agile projects to try to join the two approaches.

Things we forget about the database

Like many of you out there, I’ve been involved in distributed, multi layer software development for some time. Starting with the first ever asp based web site I’ve developed in 96 or 97, I’ve built a lot of multilayer applications.

In the process, I’ve gotten used to switching to new tools and approaches as long as they were tested and approved by the industry, and the developers I trust and respect. As a result, I’ve developed habits, I’ve become used to reaching for my trusted tools in many of the projects.  I’ve made some friends like llblgen and hibernate for db access, and they’ve been with me for a long time now. However, a group of friends of mine, did not switch to these widely accepted solutions, and I’ve been giving them a lot of critisim for not doing so. Especially Ufuk, an old friend of mine, who has been developing his product for over 10 years now, has been a proponent of db level optimization. Where I defended the idea of portability of databases, he defended performance, and choosing a platform for db and sticking with it.

In the past years, he has proven that my argument did not hold so well. In all the projects I’ve taken part in the last 8 years or so, how many times do you think the db layer changed? I can only think of one case, and even that one did not go beyond a proof of concept. I can’t help remember being forced to resorting to stored procedures and db level code when my team was challenged with sorting and printing out financial records of almost 10 companies from a central db, that spans a year’s transactions. Would you like to do that using hibernate? Good luck with that.

While I was investing in the future configuration and product changes in future, Ufuk kept optimizing his product, and investing into stored procedures in db, in his chosen db layer product: MS Sql Server.  I’ve seen his software perform exceptionally fast on quite moderate hardware. So for the last two years I’ve started to pay attention to the things I’ve seen in the past. The things done by experienced DBAs working on Oracle or MS Sql Server, the things done by developers who avoid ORM, and focus on squeezing out every bit of performance from their DB layer. Looking back at the last five years or so, I have to admit that I’ve been wrong in some of my choices, and I intend to not to make the same mistakes again. Especially for the persistence layer I’m testing for openEHR reference implementation extensions I’m developing.

I’ve chosen an open source db, postgresql, and I’m still using hibernate, but this time, I’ll put some effort into stored procedures (functions in postgresql speak), and make sure that I am using temporary tables, sets covering custom types etc. After all, there is no doubt that healthcare data is going to be large in size, so it is time to give the respect to the db layer, which has been the most mature layer in today’s solutions. DB products has been evolving for the last 20 years or more, and we have been neglecting them, focusing on throwing more hardware to the middle layer(s). This seems to be the classic hammer and nail symptom, where a man with only a hammer sees all problems as nails.  I’ll put some serious effort into exploiting postgresql, and this time, I’ll even drop the assumption that people would like to use mysql. If they do want to use MySql indeed, the whole thing is open source, so they can port my db layer code. If they can’t do it just because MySql does not have some feature, than I won’t sacrifice performance by establishing a lower common denominator,  just to make an uncertain future porting operation easier. As long as I’m using a trusted open source DB solution, this should be acceptable, since all the code is open source, and that’s an opportunity for others to change it into their favorite architecture or configuration.

Let’s see how it goes, I have a couple of plans for generating stored procedures automatically from archtype wrappers I’ve written, and if it works, it may be really fast. I’ll write down the results.

Extending markup mechanisms in web tier for better archetype bindings

Ok,

This is probably a weird title, but when you face the same situation that I am facing at the moment, it will make sense. That situation is, when you are working an a web based application with a decent web tier technology, and you are trying to bind the UI layer to a back end layer. Almost all of the recent web stacks give you a declerative UI layer with some sort of markup language that allows you to refer to backing objects for bindings. In my case these objects are AOM instances (what a big surprise …).

What I am not happy with is the usual focus on backend functionality, building most of the code at the layer that is just behind the UI layer, and using the markup in UI layer just to do simple binding. Of course there is a reason that the markup in UI layer is quite simple in most recent technologies: if it was not like that you’d go back to days of ASP and JSP where code was very intimate with markup. That’s a scary mess, but I still feel that we are not making use of the bindings in the UI layer as much as we can. With some tweaking in these markup features, we may have a better distribution of functionality, with some responsibilities shifted to markup side of things. With some tooling support (cough: eclipse) this may even lead to a small domain specific language in UI layer for archetype bindings. In case of JSF, there is the Unified Expression Language extension mechanism    and for WPF, there is markup extensions for XAML So in case you are using one these technologies and doing something I’m doing, you may consider taking a look at these.

Web development with Java

Now that’s a large topic. A very, very large topic. In case you are into Java web development, you are going to have to decide between millions of frameworks for similar tasks, and sometimes for the simplest and most obvious thing to do, you’ll have to introduce a whole framework into your project. The learning curve for all these frameworks is also another problem. You start with GWT, and realise that it does not feel right always. There are times when it is nice to have Java code compiled into JavaScript, but there are times when it does not feel right. So you want a more server side approach, and that’s where things start to get crazy.

You have struts (old, but still used), JSP, JSF (on top of JSP), Facelets (because JSF on top of JSP is problematic), JSF 2.0 (soon), GWT, Tapestry etc… You have specs and application servers, web containers with their soap opera relationships with specs. J2EE 1.4 or J2EE 5? Tomcat or JBoss, if so which version? Java 5 or 6?

By the way, why not think about different implementations of the same standard? MyFaces or Sun’s reference implementation of JSF? Oh, I forgot to mention, J2EE 5 makes application servers provide their own implementation, so you may need to figure out how to replace defaults with your library.

While you think about this, endless libraries and frameworks  are born and an equal amount of them is dying. It is like the universe with stars and planets. Only for Ajax in JSF you have at least four options, some of them being relatives of others. GWT, Ajax4JSF, RichFaces, GWT4JSF….

And what about the glue type of frameworks? Spring and/or Seam for your web application would be nice, don’t you think so?

I am just trying to come up with a set of reliable and productive tools and technologies, and for Java and web based development, I do not feel like I’m winning. Seriously, things are getting out of control in this domain. The problem is I am in a position where I have to build a set of tools that will support web based development, but in such a fragmented domain, how can I target the combination with largest set of users? I don’t know, I really do not know.

Tablelayout: some people still has some common sense left in them.

I’ve spent almost all day trying to write a small piece of code that would generate a form using an archetype as input. I had everything written, accept the code that would create the form. Would be a no brainer, right? Well if it is java swing that you are using, that is not the case.

What I wanted was very simple, however the “powerful” layout engines in swing just would not let me do it. After spending hours cursing to gridbaglayout, I finally discovered tablelayout here.It works exactly it was meant to be. No catches, no smart tricks. It simply works. In case you want to save your own hours, just take a look at it.

And if you respond with “but gridbaglayout is sooo powerful”, I’ll punch you in the face.