November 2012 – Musings of a confused technology addict.

Warning: this is a re-post of an old blog entry. Things have moved since I’ve written this, and mostly in a better direction. Just remember that you’re reading a 4 year old entry.

Introduction and outline

This document is a guide for the software developer who is interested, or forced to be interested in openEHR. Let’s be honest, you may not be interested in the idea of semantic interoperability for healthcare or better healthcare services for everyone (including you). It is OK, this guide would still be helpful. On the other hand, if you are excited about openEHR, and you want to figure out what it means for you, than hopefully this guide will give you an overall view of openEHR, and help you out in using the other documents.

openEHR is a big initiative. The word initiative is used intentionally here, since calling openEHR a “standard” is actually underrating the huge amount of work and people that exist in and around the cumulative work. openEHR is a standard, that is true, but thanks to efforts of many people who has been working on various aspects related to standard, like implementations, tools etc, there is much more than simple docs that is available to you.

However, a software developer is mostly interested in writing software, and from a software development perspective, openEHR can be described better using certain terms and concepts from the software development domain. The following is an overview of the content you will find in this document.

First, we are going to introduce some of the concepts which have become popular in recent years in software development. We are going to be talking about them, because software developers are much better at understanding concepts if the concepts are provided to them using previous similar concepts.

After taking a look at the problems in the software development and recent popular approaches to solving these problems, we are going to take a look at the openEHR standard. We are going to see some of the approaches provided by openEHR, and we will try to see similarities between these approaches and software engineering methods which will be outlined in the first part.

Considering that this guide is written for software developers, major components of the openEHR standard will be outlined, along with what is in the standard, and what is not. (and also why).

The final part of the guide will be giving clues for next steps for the curious developer, who would like to give the Java reference implementation a try.

This guide is expected to be expanded in the future, with new sections focusing on actual code examples and deeper technical discussion

The state of software development in healthcare

The healthcare IT market is growing fast. It is one of the most challenging fields in software development, and it is a very rewarding domain with a lot of demand for particular solutions. Some concepts like semantic interoperability and electronic health records have become the holy grail of the domain, and there is still an army of people chasing that holy grail including you and me. There is no complete solution yet for some of the most heavily demanded functionality, like securely shared medical information, but progress is there.

As software development adopts to requirements of the world, new tools, new methods and new approaches are introduced every day. These changes in software development can contribute to the progress, so that healthcare domain can have better solutions to its well known problems. So what kind of changes are we talking about?

First of all, software engineering is still struggling to decrease the amount of uncertainty in software development. Software development is trying to change itself from being a fuzzy black art to an engineering discipline with well defined rules and metrics. Like healthcare, putting rules and well defined methods into software development is not an easy task, and there is still a huge amount of software that is being developed in ad-hoc ways. Project managers still have to explain why the team did not deliver the project before deadline, and software developers still have to hunt bugs. Certain practices have been accepted quite well by developers however, and these practices along with the tools that helps developers follow these practices have become almost mainstream. As time goes by, some terms which used to be cutting edge in the past become parts of everyday language between developers, and tools and methods related to these terms become common tools and methods. Developers are now expected to be familiar with these terms. Remember a couple of years ago, AJAX used to be a buzzword, SOA and MDA , along with Aspect Oriented Programming too.

In time, software industry invested in to these buzzwords, tested them, and in the process these technologies began to mature. Some of them were accepted more than others, and became more mainstream, and some of them, though being potentially quite important, did not take off as expected. We’ll mention about some key approaches below, and later we’ll see the relationship between them and openEHR.

MDA (Model Driven Architecture) is an approach for developing software. It is (roughly) based on the idea of using well defined specifications and guidelines for software development. What is the difference between this approach and the usual analysis methods you’ve certainly come across?

MDA is one of the the results of people recognizing a very common problem: requirements not being represented correctly and efficiently. Not representing a requirement correctly is probably the worst thing you can do when you are developing software. The client asks for a particular functionality, something that exists in the domain.

Real life example: you are in an analysis meeting with a doctor, and he says : “I’d like to record my initial opinions about the patient”. You take your note, write it down to your notebook, or maybe you record it in the form of a formal(ish) method like a simple use case diagram. You assume that a text box on screen is enough for this. All the doctor has to do is to type his thoughts, and you’re done. So after that part of analysis is over, you go to the office and you turn that requirement into code. Based on your particular design approach, you may design classes and than design a database table, where a single column in a table is used to hold this information. Then you have to write some code that will take the information from use interface, put into your class instance, and save it to database when necessary. Or if you are following DB first approach, you may design a database table and then write classes that will read and write information to database, and at last design user interfaces that will present and gather data. You may think that some of the approaches like DB first design is way too old, and “nobody does that anymore”. Well, it is not. There are still millions of developers who use these methods, and all these developers would feel more or less the same way when the following occurs:

After spending hours for writing the code, designing the database and user interface, testing it etc. you arrange another meeting with your customers, and the doctor who gave you the original requirement for recording his opinions looks at the software you’ve developed and shakes his head: “that’s not the was it is supposed to be”. You, under the obligation of being polite, ask “sorry?”. The doctor explains that he’d rather have a list of items on the screen each representing an opinion, so that he can read them better, and even more, he wants to use a certain code system for selecting opinions, or to be more accurate his diagnoses. Since he did not mention any of these in the first meeting, you say that he just said he’d like to record his opinions, and you’ve done what he wanted. All he does is shrug. “This is what I meant, I though you’d understand”

And it is done. Your hours of work is gone. You know that there is not a single piece of work you can save from what you have done before. Since there are multiple items that you’ll need to show in a list, your db design is pretty much irrelevant now. You have to introduce a new table for purposes of db normalization, that will hold selected condition ids. Your class which used to have a String field now has to have a collection of strings. Wait, we did not talk about the coding requirements! The user wants to use some coding standard.

So you get back to work, and following the same approach before, do the whole thing. For coding, you put a button on the form, and when the user clicks the button, a new dialog is displayed where there is a tree control, that includes hierarchical codes for conditions. You use a previously defined database table,created by another programmer. You write a smart widget that accesses the db as tree nodes are expanded and child nodes are added dynamically. You test the application, add a couple of conditions and it works. You are ready for the next meeting this time. You call your show your work, you are done now, right?

You perform the demo to the client, and you ask him to test it. He opens the dialog for selecting the conditions, and starts navigating the tree. He starts looking for a condition, and his journey through the tree never seems to end! After expanding about a dozen nodes, he finds what he wants, and clicks it. Then comes the next one, which is no easier to reach. After three diagnoses he turns to you and says: “sorry, we can not use this”. You find your hand shaking a little bit, but still manage to ask “why?”.

He says it would be impossible to use this during work hours where patients are waiting in the queue, and he has almost no time. Moreover, he wants to know if he can add his own codes to the list. You ask him what kind of user interface would he like then? And he says “I don’t know, something easier to use”.

Any developer who has done more than a year of development in a similar field knows what I’m talking about. This is software development. In the first case, you misunderstood the requirement,and your work is gone, your companies money is gone also, since they paid for those hours where no usable output was produced. In the second case, you got the requirement right, but this time due to nature and structure of domain data (the codes), you faced a usability barrier, a non-functional requirement is not satisfied (the system should be easy to use).

There is no escaping from what happened here, it has happened in the past and it will happen in the future again. You can not eliminate it, but you can decrease the amount of losses by handling it better.

MDA can help. If you had a model of the requirement, expressed in a specific format, some parts of this continuous process could have been much easier, and you could use the time you save in other places. For example, if persistence of data regarding this requirement was handled in a smarter way, say with some library that knows how to save and load data that is kept in your model, you would not lose so much time. Again, if some other tool made sure that you don’t have to change db schemas and write db read/write code when your model changes, would not that help? All you would have to do is to express the requirements in some form that can be used by software tools that would automate some of the following tasks. Handling persistence, and even business logic related class code generation.

MDA is build on this idea. You have formally represented models of real life concepts. The way your models are represented is important, and formal representation is the key here. Formal representation does not only mean that there is well defined method like a file format, or an xml schema, but usually this representation has other capabilities too. For example if we use a machine processable representation form, we may be able to automatically detect inconsistencies in models. If there are conflicting statements about the same real life entity in different models, our processable representation form may help us catch this inconsistency at the design phase, so that we can avoid a lot of future trouble and cost.

Moreover, our formal representation forms can also be a domain specific language. If we encapsulate commonly used terms and concepts from our domain, then representation of real life concepts become even easier. There is now a common language that can be used within all people who are expressing real life concepts. So you can avoid cases where an analyst uses the word “problem” while another uses “diagnosis”. In a fairly large project these little differences would probably end up as quite similar classes with duplicated functionality.

So expressing models with well defined, machine processable domain specific languages can save us from some trouble. We now have a less error prone method of modeling real life concepts. This is good, but it is not enough. Remember all the work you’d have to do if you followed a less formal method? Writing code to express real life concepts in whatever technology you use, and designing databased schemas? Do you also remember changing them, throwing away hours of work when you make mistakes? Well, models created with the approach of MDA solves some of your problems, not all of them though.

You’d still have to write the code for processing and saving those models. MDA also introduces the idea of platform independent models which can be transformed automatically to platform specific entities. So when you have a model, you can generate various artifacts from them, regardless of your choice of technology. (of course your tool set should allow this, if you are using Oberon or Assembly to write code, do not blame me for giving you false hopes).

So we can automate a significant amount of work using the model driven approach. The models can be automatically transformed into meaningful entities in your favorite (or imposed by your boss) technology. A patient examination act can be represented with a model, which can be transformed into a PatientExam class for example in Java or C#. Moreover, same generative approach can be used to give you a framework where requirements like persistence, or even GUI generation is handled automatically. Once you start building on the idea of a well defined, controllable form of expressing domain concepts quite impressive opportunities begin to emerge.

The important thing to remember at this point is that you are still not immune to making mistakes!. Software development is inherently hard, and you’ll still get things wrong, you’ll still make mistakes. The approaches developed by software engineering discipline target to decrease the amount of your mistakes but there is no guarantee for you developing software without any mistake. Even if you do not make mistakes, your clients will request new features or changes to existing ones. Software is about change, and dealing with change. The change may be necessary due to an error, or a request from the client. It does not matter, what matters is, you have to respond to these scenarios as efficiently as possible. The more agile you are, the easier you respond to these situations. So we are talking about achieving the same results you’ve been achieving in a more efficient way.

You can see some of the ideas realized in other approaches too, for example Object Relational Mapping Tools like Hibernate aim to isolate you from the database layer by generating classes that represent database tables. You are also given an automatically generated set of classes which save, load and query database tables, so when a change occurs in database, you can get back to focusing on actual business logic related code instead of rewriting all your database layer. Actually, object relational mapping tools caused a return to database centric methods of modeling domains to a noticeable extend, since developers realized that once they were done with designing a database, the rest of the work could be pretty much automated. Whether or not this is a good outcome is subject to discussion, but let’s just try to see that when you introduce solutions to well known issues in software development, you can switch to a more efficient form of software development.

Another benefit of using a formally defined model as a beginning point in software development is achieving a common language and a common development approach within your team. If you have ever been involved in a project with more than two or three developers you’ll see that habits or different interpretations of the situation at hand lead to quite different chunks of code. In time, ownership of code begins to be distributed among developers, each of them usually working on their own code. In the middle run developers may start violating coding conventions but things are still not that bad. In the long run they start developing their own utility classes or libraries for their own part of the code. The worst case is probably when the developers express domain concepts based on their own methods and experience. Since developers do move from company to company, or get assigned to other projects or somehow decide that they should rather run an antique shop, people end up trying to use or fix other people’s code. The initial response in a problematic project is “oh that’s Jason’s part, I really do not know how it works, I’ll need to take a look”. If things go all right, the new developer in charge of that part of the code will understand what is going on, and take care of the problem. If things do not go all right (and they rarely do in software development), the new developer will partly hack the code, and an even uglier brain child of two developers will be born, making it even harder for the next developer to understand what is going on. At some point someone will give up and rewrite that part of code. So new bugs with new costs will be introduced, and hours of payed work will go down the drain. What happens in this case is demonstrated in the diagram below.

This diagram shows the process that begins with identification of domain concepts, and their transformation into code by developers. Each developer works in the same domain, sometimes on the same concepts in the domain. However, they will all process the concepts based on their own methods, and they will end up producing their own views of the concepts in code. These differences in interpretation will be reflected into various layers of the software too. Persistence, user interface and business logic aspects of the software will also specialize, at least to an extend. There are many process improvement approaches from coding standards to proper documentation, to minimize problems that occur as a result of this situation. Still they do not change the fact that each developer is responsible with creating some form of representation of the domain concept in code, and there is an inevitable amount of integration among the outputs of different developers in the team.

Now when you have a common set of rules embedded into your development process, you are much less likely to suffer from these problems. If models are the only way of introducing domain concepts into your code, then it is unlikely that you’ll end up writing duplicate code. You’d mostly be adding business logic to a set of code that is already automatically generated for you. Less variation in project code is less time lost, and better capability to help others or to get help, since your code is clustered around commonly accepted classes.

What I would like to say at this point is, MDA may be the main approach that we have used so far, but when you read the following sections on openEHR, and when you start working on it in the future, do not see openEHR only in terms of MDA. openEHR incorporates a lot of concepts from different software engineering methods. You will see that openEHR’s design benefits from many software engineering practices. MDA is chosen in this guide as an introductory common ground for a software developer, but a good developer should try to see the other important ideas embedded in the standard.

MDA and other ideas have been being introduced into software development for the purpose of simply better software development, and many tools and approaches have been integrating them to software development process. But why are we mentioning them now? What is the relationship between openEHR and software development?

OpenEHR for software developers.

As a software developer, the first impression you’d get from openEHR may not be that it is very much related to software development. It is a standard for electronic healthcare records. It should basically be a set of documents that tell you have to express information and save it, right? This basic assumption is correct, but there is so much more in openEHR that it would be a shame if you, as a developer did not know about.

As a developer, you should know that openEHR is not only about saving some medical information in a pre-determined format. Just because its name is about records does not mean that is focused on storage of information. openEHR has a huge amount of features around medical information, and these features can help you out a lot as a software developer. Let’s see the major features and components of the standard, and how they are related to your tasks.

Modeling

openEHR uses ADL (Archetype Definition Language) to express clinical concepts. Since it is a formal, well defined way of creating models that represent clinical concepts, ADL provides many of the options we’ve discussed when we talked about model driven approaches above. As a developer it is important that you understand what an archetype is in the openEHR world. For you, an archetype is a unit of information, it is an entity that encapsulates a part of the domain that you’ll be writing software for. Let’s assume you are tasked with writing a piece of software that will be used in the emergency department, and poisoning is a quite common case in this department. So doctors need to provide a set of information regarding the poisoning event to the information system. This information will be used by other doctors during treatment, it will be saved into the system so that it will be a part of the patient’s medical history in the future, and many other parties like researchers will try to use this information in the future. In openEHR world, as a developer, you will be given something called a poisoning archetype. This will most probably be a text file which will contain content in the form of ADL. This ADL file will tell you what kind of information is used to model a poisoning event. The question that may arise is: does openEHR define a set of archetypes for each and every concept in the medicine domain? Are there archetypes for car accidents, falls, cancer or alien attacks? The answer is NO.

openEHR is a standard that provides guidelines that tell how to model medical concepts. It does not give a list of actual medical concepts as part of the standard. This requirement of listing and classifying concepts from medical domain is handled by terminologies usually. openEHR has well defined mechanisms for making use of that kind of work, but the standard itself does not contain an actual list of things in the medical domain. To make this bit a little bit clearer, we can say that openEHR is about recording and using identified concepts. Ontologies of reality (like Snomed) gives us a list of identified concepts in medicine. openEHR refers to these ontologies to link saved or used information with real life concepts. So when a piece of information is expressed using openEHR, you should not be surprised to see references to terminologies which are used as identifiers of concepts expressed in openEHR.

We said that (actually I said that) openEHR standard tells you how you should express, or model domain concepts, and ADL is the formal way these models are expressed. So how does ADL work? What does it do? ADL is an interesting language, because it work by the method of constraints. In case you want to express a real life concept like body temperature in ADL, you do this by constraining that concept. You “describe” something by listing its boundaries. If you were to define things only by using constraints, then body temperature would be something between 0 or 42 degrees (for a dead or alive body) with degrees being the unit of measurement. The data type for expressing this concept would be a double, and it would belong to a patient of course. In fact, it is possible to create a method of defining things by describing constraints regarding their attributes, and ADL does this. You should know that ADL also allows you to define data structures too, but these data structures are mostly used for expressing non-medical data in an archetype. (like the creator of the archetype, its version etc)

Wait a minute, how did we go from ADL to an archetype? What is the relationship between them? ADL is the language that we use to create archetypes. Archetypes are the models in openEHR world, which encapsulate medical concepts. So from now on when we say an archetype, it corresponds to a model that describes a real life concept.

Back to constraining things: why ADL does that? What is the benefit for the software developer here? First of all whatever data an information system processes, data validation is an inevitable requirement. As a developer, you are responsible for that validation, you can not insert a string value to a numeric column in a database, and you would better not let people type “asdf” in a field on the screen which is for entering age. It is not only about type safety either, even if your software allows the user to put a number into the age field the number should not be 780 unless you are writing a medical application for vampires. This kind of dirty data will cause a lot of trouble in the future. There are certain fields in every software which should be mandatory etc. So why not include this kind of requirements from the beginning, where requirements are specified? ADL does that for you, and since this aspect is covered within the formal model, you can make use of it in future steps. (we’ll talk about benefits of this more in the future)

Another advantage of constraint based approach is that it gives you a set of different combination of attributes and values for expressing a concept. This way, the model has a controlled amount of variation, it is neither very strict nor very free. Both of these conditions would harm our goals, especially in the context of model driven architecture.

Back to the big picture: we have a model at hand that describes a medical concept, the poisoning event. But what is exactly constrained here? Where does the things which are constrained by the archetype come from? They come from the reference model, which is a part of openEHR. The reference model contains various fundamental data types and structures, and other necessary things that would be required to represent concepts. This is called two level modeling, where reference model is a level, and archetypes constitute another level by expressing constraints over them. Using this approach, you can use fundamental building blocks in one level, and their compositions in the second level to express domain concepts. The compositions are archetypes in our case, and even if we use the term composition, please remember the golden rule: archetypes work by constraining.

Ok, so who builds archetypes and how? Well, it may be you, but it would be better if a clinician actually built them. And even if you can theoretically write them down using a simple text editor, you’d better use one of the freely available tools that would make things much easier for you. There appears to be a resemblance between the approach of MDA and using archetypes, but do archetypes support all the things we’ve discussed? Actually they do.

Implementations of openEHR

openEHR reference model is platform independent, but it is designed in a way that the fundamental types and structures it contains can be mapped to many modern development languages. These mappings are called implementations of the standard. The reference model clearly describes the attributes of the types used in archetypes. Types like String, Numeric values, Date types etc are all described in the relevant parts of the standard, so an implementation in Java is responsible with either using types which satisfy these definitions or with introducing implementations using its own language mechanisms. What we mean here is, if the standard describes a Date type with various attributes and capabilities, and if the Date type in your favorite programming language does not cover all the attributes and capabilities, then it is your responsibility to create that Date type in your language.

When you have a complete set of entities in the information model implemented in your technology of choice (C#, Java etc), you have an important asset. Think about the previous problematic case we have identified, in which developers are producing outputs which requires integration. Using reference model entities, we can now decrease the amount of error prone integration and inter-team communication efforts. The new approach is shown in the diagram below.

In this approach, developers use reference model implementation for expressing and using domain concepts in code. The reference model implementation contains a set of models with pre-defined attributes and behavior. These models are well suited to representing domain concepts, and the reference model implementation takes quite a bit of burden from the individual developer’s shoulders. Now the output of the development team is in a way using a domain specific language. Moreover, since the basic types and structures used by developers is well defined, there are no surprises due to personal interpretation of developers, which means that it is possible to benefit from tooling and middleware that can perform common tasks on the output of developers. Persistence mechanisms can be built which will expect reference model implementation based inputs. So as long as you use reference model implementation, you can isolate yourself from many troubles of persistence, by using highly automated code. Same arguments hold for user interface, business logic, workflow, decision support etc.. The reference implementation is potentially a platform for cooperation among many tools. It is a way of eliminating many common errors and problems, and hence costs.

Representation of information model entities in your development environment is not all that you need. You’d also need an archetype parser that would parse the archetype and give you a representation of it, again in your favorite language.

What kind of output you get from the archetype parser is also defined within the openEHR standard. This is called the archetype object model, and archetype parsers are responsible with producing output that is compatible with this part of the standard. So you should have an implementation of everything in the information model, a parser and an implementation of parser outcomes to be able to use an archetype.

What exactly do we mean when we say when using an archetype? As a software developer, IF you have the implementations mentioned above, an archetype parser gives you an in memory representation of actual models (requirements in a way). This is a very nice mechanism that gives you an output that you can use in your code. Think about it: an analyst, or a doctor uses a tool (archetype designer) to express the concept. This expression (archetype) contains data types, valid values, structure of attributes, even used coding systems (we’ll discuss this further). All you have to do is to use the parser, and this work is in your development environment as an in memory graph structure. If requirements change, the archetype will be changed, and (there are mechanism like versioning in the standard by the way) the changed archetype is given to you by the parser.

But what do you do after you parse an archetype? You expect to make use of it of course, but how? This is probably the most confusing point for anyone who is trying to understand how an archetype based software development approach works. So lets get into a little bit more detail. After you parse an archetype, you have a tree structure at your hand. The nodes of this tree are implementations of the Archetype Object Model. This tree is an object oriented representation of the textual archetype that your parser parsed. You now have the constraints and other data expressed in the archetype in the memory. But what can you do with it? This is a critical point, where you need to remember that what you have is a tree that describes the constraints. This tree is not a container for actual values of constrained types.

If a node in this tree describes a numeric value between 5 and 10, it means that this node of the medical concept will be represented by an object with the given type and that object will have a value between 5 and 10.

It is very important that you understand the relationship between archetype object model instances and reference model instances. The archetype object model instances that you get from the parser give you a description of the reference model instances that you’ll create to hold actual values. (Please read the previous sentence a couple of times) . Archetypes are models, but due to two layer modeling, the models are actually made up of types in the reference model. So when you have the output of archetype parser, you do not have any values at all, you now know what kind of structure you need to build, and you have not build it yet. Let’s keep things as simple and usual as they can be, and assume that you are going to take values from a user interface. You can automatically build a user interface using an archetype. For every node in the archetype parser output, you know the type of the node. This can be a literal value, a numeric value, and a set of values with one of them selected as the actual value of the node. (like a set of potential poisoning reasons expressed in codes). You can simply create a visual component for each of the nodes, a text box for text fields, a drop down box (combo box) for multiple values with a single value to select etc. So it is possible to create a widget or form that can be used to ask for data that will fill the model expressed by the archetype. Assuming you have done this, you now have a form with fields on it, and the user enters requested information, and clicks a button, lets say to save the information.

Now what? Developers mostly focus on the tree structure that they have received from the archetype parser, but that structure is not meant to hold data. To hold and process data, we need a parallel structure made up of reference model instances. If you have an archetype object model node in memory, which has a RM_Type_Name property with a value of “DV_Quantity”, this means that the value the user will enter into text field in the GUI is meant to be encapsulated in an RM instance of type DV_Quantity. Archetype object model instances point to reference model instances. An archetype object model instance is responsible with describing the reference model instance that you’ll use, so you should not assign data related responsibilities to it.

So you use reference model instances to hold and process real data. Another common question is “why am I using reference model types, and their implementation to hold and to process data?”. After all, you can simply get a value from the user interface, save it to some database table, and later get it back. If you keep a link to the original archetype node that has generated this node, you are done!. Well, not always. First of all, a value like 64 is not information, it is data. If a piece of software uses an integer to save 64, and another uses a double, there is now a type mismatch between these two systems. This scenario can happen even in the same development team. Another example is about dates. If an information system (or a developer) uses a particular date format (dd-mm-yy) to save a date, and another one users another format (mm-dd-yyyy) what happens a single piece of code tries to list all events in the history of this patient three years later?

This is why the reference model has its own types with very clear descriptions of attributes and behavior of the types. If you do not use real life values you get from the users with reference model types, you’ll deviate from the standard, and communication between users of the standard will fail. As long as you use reference model implementation to represent and process data, your code will behave consistently. This also eliminates the utility type of code written around custom use cases. A fellow developer of yours will probably introduce some utility method like “toUSADate()” into his custom use of a date value, and the next person who will try to make use of his code will have to figure out these utility functions around the code. Remember the scenario where someone was trying to figure out Jason’s code? If everyone uses the same reference model implementation, a lot of boilerplate code will not be necessary, and more important than that the way the code behaves will be consistent. Both within a system and among different systems. This is another realization of a domain specific language. Using the reference model, your developer friends (and you) have a common language in the code. A date related to a medical event is always the same type and there is now way of interpreting it differently. It is not possible for a developer to use “DateTime” while another uses a “Date” type in the language they are using. Even though interoperability is the primary reason openEHR is born, this side benefit of providing a common ground to developers should not be underestimated in terms of the benefits it brings. These little differences sum up to great losses and costs in software development, and openEHR can provide a significant benefit when it is perceived as a development framework. As a developer, or a manager you can benefit from openEHR in a quite unexpected way, as a software development framework.

If you take a look at the common functionality that exists in medical applications, you’ll see that all the major aspects are covered in openEHR, and moreover, the standard has considered how these aspects will be implemented by information systems. Remember the scenario where the doctor was not happy about your solution for providing a selection dialog for terminology nodes? openEHR archetypes allow the designers to impose constraints on terminologies, so whoever designs the archetype can state that only a subset of a terminology is appropriate for a selection.

What about issues like persistence, and querying? You’ll find out (or know) that saving medical data and later querying it is no easy task, and changes in requirements become quite costly when they are reflected into db design. If db design is not isolated from the code that processes data a change in a requirement creates a ripple effect, just like we’ve mentioned in the beginning. openEHR has considered persistence and querying requirements of the data as a part of the standard, so you can leverage existing research to introduce conceptually the same benefits we’ve outlined in reference model implementation. It is possible to automate aspects like persistence and querying to a large extend. Especially by adopting a generic mechanism that saves and loads model (archetype) based data, it is possible to avoid a large amount of effort that would otherwise be necessary to cover these aspects. openEHR wiki has pointers to potential methods of implementation for generic solutions to these concerns.

Other relevant concerns and requirements

Archetypes are the building blocks, fundamental units of representation in openEHR standard. However, archetypes are built for representing domain concepts, and there are some common, inevitable requirements regarding their use. Most of these requirements are also covered by the standard. A mechanism of the mainstream object oriented languages, “inheritance” is available for archetypes. So what do we earn by using inheritance? Inheritance is a quite good short cut for reusing definitions and behaviors of concepts. Its overuse can lead to long term problems in programming languages, but for archetypes the implementation of inheritance is not as deep as it is for programming languages. So it is possible to inherit an archetype that models a specific laboratory result from a generic laboratory result archetype. Archetypes also have the ability to contain other archetypes via a mechanism called slots. So you (or whoever does the modeling) can use mechanisms like inheritance and composition. Since these mechanism are supported quite well by mainstream languages, the tools that connect archetypes to implementations of the standard help you benefit from these mechanisms in your chosen development technology.

openEHR focuses on archetypes for representing domain concepts, but the standard also includes some other constructs which will inevitably emerge during the use of archetypes. What kind or constructs are we talking about? Well, if archetypes have a composition mechanism, we can place an archetype into another one, but what about grouping them together? Grouping archetypes is almost inevitable. Think about a medical process. There are different bits and pieces of information that must be captured and processed, like demographic data regarding patient, medical data related to status of the patient, a set of actions like administration of a drug etc. So, a real life medical operation as a whole is represented using more than one archetype. A demographic archetype can be used to model patient demographics, and an examination archetype can be used to model medical data. In this case, including archetypes inside other archetypes is not a solution. You can not add slots for each and every other type of archetype in every archetype you design.

Grouping archetypes is not the only requirement that will emerge quite often. Based on certain conditions, parts of an archetype may not be used, or some fields in an archetype may be constrained further, again based on certain conditions. Did we mention GUI related requirements? What if the creator of an archetype wanted to make sure that a particular field is represented in a particular way? These concerns, along with things like grouping etc are not related to medical domain. They are related to archetypes themselves. It would not be smart to let archetype definitions to contain data related to these kind of things, because they are not domain modeling related, and archetypes are for modeling domain. On the other hand, it is obvious that these use cases will emerge during the use of archetypes, so openEHR covers these uses cases and issues too. Templates in openEHR introduces well defined methods to cover requirements like the ones we’ve listed above: things which are not related to domain modeling directly, but will emerge in many scenarios, and will be necessary. By introducing templates, openEHR provides standardization around frequently emerging concerns, and you can exploit this approach during your implementation. Templates are work in progress at the moment, but you’ll find out that some quite common problems are handled via templates.

How to get started?

Good news is, there is a Java based reference implementation of openEHR, which contains an archetype parser, an implementation of archetype object model, and an implementation of reference model. For practical purposes, this is a quite nice starting point. This implementation is capable of taking you to the point where you can parse an archetype, get an in memory representation of it, and make use of it using reference model instances. There are useful implementations of key concepts like a terminology server integration etc. in this reference implementation, and you should experience on it. For a developer, making use of the documentation is important, and openEHR has a quite large set of documents. As a developer you may be working in a particular aspect like archetype modeling. Then you should make sure that you have ADL documentation and Archetype object model specifications with you. There is a shortcut mechanism in ADL that lets archetype creators to express various data types using a different syntax then the overall ADL syntax. Try to understand the differences between sub-syntaxes of ADL (cADL and dADL) first. When you feel that you have an opinion about how ADL works, take a look at information model documents. Whatever is being used (constrained) in an archetype has its definition in the information model documents. Data types, data structures like lists or sets are defined in the information model documentation. So if you want to work on modeling related aspects, along with overview documentation, these should help you get started.

In case you are a developer who is more focused on processing archetypes, you’ll have different priorities. In the ideal case, someone will give you a set of archetypes, representing models, and your task will be to produce software that makes use of them. In this case, a quick look at ADL documentation is necessary, but you should have absolute control over archetype object model documentation, since that is what you’ll get from the parser. You should also know about information model types and structures, since you are going to need them for processing data.

For a developer, the best way to clarify things regarding openEHR is to write a very simple application that parses an archetype, creates some sort of data input form for it, and saves and loads data. The Java reference implementation contains everything that you’d need to perform this task, and you are recommended to make use of coding and documentation in a balanced way. Do not try to memorize the standard, and do not plan to get started on reference implementation after having complete control of the documentation. The standard is big, and it is concerned with a lot of things you’ll probably not face in the learning phase, so do not introduce a steep learning curve trying to “get it” fully.

What is in the standard and what is not?

From a software developer’s perspective, the standard ends up being the answer to most of the questions. You should not expect to find the answer to all of your questions in the standard though, since that answer was found elsewhere a long time ago and it is 42.

You will realize that many things related to making an openEHR implementation work are not in the standard. It is not unusual for a software developer to expect more concrete material for an electronic healthcare standard. For example, a database schema for saving electronic health records would be nice would it not? The problem is, the standard has to draw boundaries to what it can define, otherwise undesired outcomes can occur. The first case is that of a very detailed standard that covers even the most complex technical details, like persistence mechanisms, how to create GUIs, and how to process parsed archetypes etc. In this case, the standard may not be implemented in some technologies and languages simply because the capabilities dictated by the standard are not available in those technologies. Even if you are using a technology that has all the necessary features, you may not need to implement all of the details in the standard, and worse than that, you simply may not have the necessary resources to do so!

So openEHR defines a lot of things, but some things are left to implementers for a good cause. openEHR has a very active and helpful community, so if you can not find the answer to your questions, be sure to ask them to the mail list(s) of openEHR.

Let’s take a look at a “left to implementor” type of thing with an example. A common question for a developer is “how do I get a GUI from an archetype?” Archetypes are usually not very large data structures, and one can easily be tempted to map them to user interfaces directly. Well, if that is what you want to do, there is nothing in the standard stopping you! (I’d still suggest that you take a look at draft templates specification) So, as a developer using openEHR, you decided that you want to build a solution where an archetype ends up as a user interface which can be used to collect information. How can you do this? Well, it is up to you! To take some of the burden from your shoulders, let me list three approaches with different pros and cons.

Option 1 is to generate a GUI automatically from an archetype. It is tempting, since as a software developer you’ll be able to write the generic code once, and you’re done. You’d have a pipeline that would take an archetype from one end, and spit out a GUI during runtime from the other hand. You’ll write an engine, that will execute during the runtime of a software and render an archetype in the GUI. Isn’t it great? You’ve even avoided designing, coding and managing user interfaces for each archetype? If you extend the functionality of your code to include templates, you can do even better. Basically we are talking about something like this:

This is a practical approach, but you’ll quite soon find out that with this shortcut, you are sacrificing quite a lot of flexibility. Once you push an archetype (or template) down the pipe, you have no method for intervention. In the runtime you’ll have an automatically generated user interface, and even if it can have things like validation etc, this approach does not let you inject GUI related logic easily. What if the users want to have a red flashing light at the bottom of the screen if the patient says that he/she was exposed to nuclear material? There is a piece of conditional execution, but there is no mechanism to express this logic in archetypes or templates. Clearly this is your responsibility, but you’ve automated the process maybe a little bit too much, so that you have no step in the process where you can write the code that does what we have just explained!

So maybe we can follow a more flexible approach, where we can benefit from model driven architecture, using archetypes and templates for automatically generating certain artifacts, but instead of generating them in the runtime, let’s assume that we’ll generate an intermediate form, something like an xml file, or source code. Since it is now possible to modify what is generated by our tools, we can inject the previous logic into this automatically generated output, and later use it within our software. This is a more flexible approach that is still supported by MDA. You’ll probably have to do more work for intermediate artifact generation and its use later, but you have more power now. Our solution would like something like this conceptually:

There is also the possibility that you are a real code master, a developer who does not like automating things, does his/her refactoring without any help from the development environment, turns of automatic code assist in his development environment, or even better use VI or Emacs to write code (and kill others who does not use VI or Emacs). With that state of mind, you decide that no automatically generated code is worthy of your compiler. You’ll write everything by hand, read ADL and design a GUI by hand (drag and drop GUI design is for little girls, just like ponies right?). So if someone wants a customization, you’ll have no problem because you know every line of your code by hearth. Your approach is basically a fully manual implementation, with minimum if not zero code generation and tool support.

The diagram is pretty much what you’d produce. So after taking a look at all of these approaches to developing a piece of software using openEHR, let’s ask the question: which one is right? What does the standard say?

Who knows? openEHR standard does not cover things like this. It can not, you have to implement these aspects based on things like technology at hand, the scope of your requirements etc. A good rule of thumb is to take help from the community, but the important thing that you should always remember is many implementation related decisions are up to you.

The second article in this series will hopefully walk you through the construction of the simple application mentioned above, but please do not use this information as an excuse to wait for that one.

As a conclusion: welcome to the world of openEHR, and I hope you’ll enjoy it.

Musings of a confused technology addict.

Month: November 2012

openEHR for practical people (cleaned up)