A discussion about Archetype Query Language semantics

This is a copy/paste of a few responses I sent to a discussion in the openEHR lists. I’m copying them here because images in my responses and responses themselves are not properly archived anywhere yet.

If you want more: I wrote a PhD thesis on this stuff, so if you want a deeper discussion of the topic here it is but I suggest you read the following first.

Here is the whole exchange from openEHR mail lists, with all responses, including mine:

From: Bjørn Næss
Date: Mon, Apr 24, 2017 at 11:01 PM
Subject: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

Hi

We have created a GIT repo with some issues from our experiences with AQL.

The repo was created to expose one issue with repeating items. You will find it here – and some feedback is welcome:

Folder with all files: https://github.com/bjornna/openehr-conformance/tree/master/aql/case1-permutations

README with the problem: https://github.com/bjornna/openehr-conformance/blob/master/aql/case1-permutations/index.adoc

The initial problem was this:

One simple example may be openEHR-EHR-OBSERVATION.body_weight.v1 which have an optional and repeating ‘any_event’. Below is a straigt forward AQL on this archetype. We are asking for the origin from the HISTORY attribute, and then for each repeating event we want the weight magnitude and unit.

As you see from the AQL there is an additional WHERE clausul telling that we only want weight with magnitude less than 45.

The question is: What do you expect as result from this query?

select
o/data[at0002]/origin/value as time,
o/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude as Weight,
o/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/units as Unit
from
composition c
contains
observation o[openEHR-EHR-OBSERVATION.body_weight.v1]
where
o/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude < 45
order by
o/data[at0002]/origin/value desc

Vennlig hilsen
Bjørn Næss
Produktansvarlig
DIPS ASA

Mobil +47 93 43 29 10

From: Pablo Pazos Date: Tue, Apr 25, 2017 at 5:54 AM
Subject: Re: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

Trying to understand the problem, I modeled the database schema in my head.

origin is like a column from the container and magnitude/units are columns for the contained node. So it is like having two tables, one for history and one for the events with the data (might be N tables in the middle but this is a simplification), so it is a one to many relationship, getting one field from the *one* side and two from the *many* side. For me the result would repeat the origin for every magnitude/units pair, and each result in the result set will be that triplet (o, m, u), repeating the o for every triplet, and m,n be values contained in the same event/cluster/element/datavalue, since the parent paths until the datavalue are the same, the only difference is the reference to the datavalue attributes.

But that is what I think it should return, maybe the semantics are different in AQL and the results should be the permutations of data between siblings, but I don’t see much sense on doing that. Also, something like a group by might be needed, but instead of having that for aggregations like in SQL, have that to group data by container.

Sorry if this doesn’t make much sens, I might not understand the whole problem 🙂

From: Bjørn Næss
Date: Tue, Apr 25, 2017 at 9:36 PM
Subject: RE: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

Thanks Pablo!

I think you have understood the problem. The real problem is how to interpret the clinical intention of the query, and then how to apply that on the openEHR RM to produce the correct output.

I have been thinking about adding different “implementation” of the problem. One “implementation” could be a ER (enitity relationship) model with tables and references. And then to apply this to SQL. Another “implementation” would be XPATH.

In this problem you need to define how to repeat:

. History.Origin by all child EVENTS

. Observation.Protocol joined by data from events.

It would be nice if you could contribute with some ER models to show how this would work in your model. Put another way; how would such a query look like and what would the output be form your system?

From: Pablo Pazos Date: Wed, Apr 26, 2017 at 2:47 AM
Subject: Re: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

On XML queries I would prefer to query and return a complete subdocument starting from the history, not just individual nodes. On that scenario querying by multiple xpath that point to individual nodes might return just a list with those nodes without any relationship. But if those nodes include some kind of parent instance id, from the list of results a program might reconstruct the hierarchy. The problem I see on this is that the database already has the hierarchy and returns a plain list and the client needs to reconstruct, why not just return the whole hierarchy and let the client process it?

I have more doubts than certainties. I know how I do things in the EHRServer, but not sure how this should work in general.

1. In the EHRServer, database is relational.
2. Queries for datavalues return whole datavalues, so there is no need of adding a projection for e.g. magnitude, units when querying a DV_QUANTITY.
3. Results can be grouped by composition, e.g. multiple instances of event will have a leaf element in the structure, the result will put all the ELEMENT.value for all the EVENTs of the same COMPOSITION instance together in the result.
4. Results can be grouped by path, all the results for e.g. systolic BP together, independently of the composition/event that contains them, but the result is annotated with the starttime of the COMPOSITION that contains the datavalue. And as I remember you mentioned on the latest demo of the EHRServer something about making that more flexible, e.g. annotating the results with the HISTORY.origin or other time in the model. I think that will be very useful on queries like the one you mention on the first email (and I have it on my TODO list 🙂

But this is not AQL (yet), is just how I designed dynamic queries that can be clinically useful. IMO AQL is too generic and some behaviors are not 100% defined yet, but is also powerful and expressive.

From: Seref Arikan
Date: Mon, May 1, 2017 at 11:53 AM
Subject: Re: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

Hi Bjørn,

I’ll respond to your interpretation of aql in the git repo in a minute. But before I do that I’d like to thank you for asking the question and going into the trouble of putting this into a git repo because your question is pointing at an important issue we have at the moment: we don’t have a well defined, formal semantics for AQL. This is my personal opinion and all disagreement is welcome.

AQL is a brilliant idea coined by Chunlan, Heath, Tom and Sam [

 

] but until recently, it was not more than a few wiki pages, it was not even part of the standard. It is now sitting next to other specifications but its documentation is mostly focusing on the syntax. The semantics of queries is mostly implied and vendors more or less arrive at the same conclusions intuitively. However, this is not good enough.

Personally I see AQL is the hidden treasure of openEHR, it has huge potential and it is the most likely candidate to implement lots of advanced use cases in an interoperable way. For that potential to be realised, AQL needs a clear description of its underlying semantics so that edge cases are clarified and vendors support the same behaviour. Otherwise, we’ll lose the ability to build portable solutions that can be moved from point A to B even though both points are openEHR implementations.

Having gotten that bit of complaint off my chest, I can comment on your actual question: I don’t think the queries you’ve provided in the git repo have any edge cases but I disagree with what is written there based on my interpretation of AQL semantics. Since you’ve gone into trouble of expressing your question in detail, I’ll try to return the courtesy. Everything I’m talking about below is in https://github.com/serefarikan/aql-discussion

Let’s start with the meaning of your queries. The aql queries you’ve given in the repo correspond to the following:
unnamed1

In the picture above, double lines that join nodes represents CONTAINS and single lines represent a full path build on parent-child relationships. Every node in these graphs is a variable, which may or may not be included in the results. These variables are placeholders and they can be filled by actual reference model instances in your data. Here is the instance you’ve provided in Json:

unnamed2

So you’re basically asking the following question:
Given the structural constraints I’m providing in this query, and this particular instance of data, in how many ways I can populate the tabular representation which is my results?

In the git repo, your interpretation of the following query’s results it that it is a cartesian product of all variables:
select
o/data[at0001]/events[at0002]/data[at0003]/items[at0004]/items[at0005]/value/magnitude as Magn,
o/data[at0001]/events[at0002]/data[at0003]/items[at0004]/items[at0005]/value/units as Units,
o/protocol[at0007]/items[at0008]/value as Protocol
from
COMPOSITION c
contains
OBSERVATION o[openEHR-EHR-OBSERVATION.multiple_events_cluster.v0]

I think you’re missing the structural constraints you’re imposing in the select clause in this query. All the leaf nodes you’re defining in that clause are direct paths from the root “o” so the Magn and Units will always be under the same parent: you cannot have (1,b) or (2,c), these results would break the structural constraints you’ve defined in your query.

Since my whole point is that AQL has subjective interpretation, I’ve used other formalisms to demonstrate my interpretation. I wrote a simplified ontology with OWL and a simplified xml file to represent the actual data input. Here is a SPARQL query that corresponds to the aql query above, run against my toy openEHR ontology:

SELECT ?magn ?units ?Protocol
WHERE {
?comp a oe:Composition .
?comp oe:contains ?obs . ?obs a oe:Observation .
?obs oe:hasChild ?protocol . ?protocol oe:hasChild ?protoEl .
?protoEl oe:dvTextValue ?Protocol .
?obs oe:hasChild ?event . ?event a oe:Event .
?event oe:hasChild ?cluster . ?cluster a oe:Cluster .
?cluster oe:hasChild ?el . ?el a oe:Element .
?el oe:hasChild ?dvq . ?dvq a oe:DvQuantity .
?dvq oe:magnitude ?magn .
?dvq oe:units ?units
#FILTER (?magn < 3.0)
}

which gives the result:
unnamed3

The Xquery version of the same query semantics:

for $obs in composition//observation
for $measr in $obs/event/cluster/item/measurement
let $m := ($measr/magnitude, $measr/units)
for $p in $obs/protocol/item_tree/element

return {($m, $p)}

produces the same results:

1
a
X

1
a
Y

2
b
X

2
b
Y

3
c
X

3
c
Y

As you can see, there is no (1,b) or (2,c) results here. The repeated rows are due to protocol/value having two values X and Y and that means you can fill in the protocol column in the first diagram I’ve pasted above in two ways, after you’ve put in magn and units.

Based on the same interpretation, your second query would not return (1,a) (1,b).. as written in the readme in the git repo. It would instead get a nice list like this. First the SPARQL version of your AQL 2:

PREFIX owl:
PREFIX rdf:
PREFIX rdfs:
PREFIX oe:
SELECT ?magn ?units
WHERE {
?comp a oe:Composition .
?comp oe:contains ?obs . ?obs a oe:Observation .
?obs oe:contains ?event . ?event a oe:Event .
?event oe:hasChild ?cluster . ?cluster a oe:Cluster .
?cluster oe:hasChild ?el . ?el a oe:Element .
?el oe:hasChild ?dvq . ?dvq a oe:DvQuantity .
?dvq oe:magnitude ?magn .
?dvq oe:units ?units
#FILTER (?magn < 3.0)
}

Then the results:
unnamed4

I’ve written the SPARQL and xquery for the queries you’ve given in your repo and they’re under the git repository along with the toy ontology I’ve written and sample xml file etc. I’ve created. No need to copy paste them here all, I think the examples above explain what I mean.

Please note that the point here is not that SPARQL, OWL, Xquery or something else is a good representation for Aql semantics. the point is that I cannot see one described in the openEHR specifications.

Currently, there is a lot of momentum in the openEHR space and people are very excited about modelling discussions and using Snomed CT etc, that’s all fine, but for whatever reason AQL is not getting the attention it needs from a specification point of view, or so I think. What I mean is:
If you get the models wrong, everybody is using the wrong models so representation is broken but interoperability is not. If you get the behaviour wrong, since everybody is not using the same implementation, interoperability is broken.

Comments, corrections are most welcomed for all of the above.

Cheers
Seref

From: Bjørn Næss
Date: Wed, May 3, 2017 at 9:45 AM
Subject: RE: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

Hi Seref

Thanks for you brilliant response on this topic. I totally agree with you: AQL is a treasure in openEHR. Without AQL eHealth will miss a lot of opportunities.

I have added some comments – mostly as new challenges below.

· Created a pull request to your repo (see below)

· Added two new examples based on openEHR-EHR-OBSERVATION.blood_pressure.v1 and openEHR-EHR-OBSERVATION.glasgow_coma_scale.v1. They are both described shortly below.

ELEMENT is atomic

From your XQUERY and SPARQL examples it looks like you have defined the queries in such a way that the openEHR ELEMENT/DATAVALUE is atomic. Put another way – the application should not divide attributes from these objects. I agree on this!

I have added a new example: #aql-11 with an AQL that queries for the DATAVALUE. This query returns the expected result. Note that the result from this query is the same as #aql-3 where the CONTAINS goes down to POINT_EVENT.

Then – how to interpret the original AQL to get the expected result? Below we have the AQL:

select
o/data[at0001]/events[at0002]/data[at0003]/items[at0004]/items[at0005]/value/magnitude as Magn,
o/data[at0001]/events[at0002]/data[at0003]/items[at0004]/items[at0005]/value/units as Units,
o/protocol[at0007]/items[at0008]/value as Protocol
from
COMPOSITION c
contains
OBSERVATION o[openEHR-EHR-OBSERVATION.multiple_events_cluster.v0]

Postulate 1: AQL providers MUST treat the paths in the SELECT clause like trees.
The trees should have paths which are equal and down to the depth of an ELEMENT. In effect this makes ELEMENT atomic (not dividable) in the resultset.

(In the AQL above I have marked the two unique paths with orange and green colors).

XQuery for the blood pressure example

The Blood Pressure example is based on repeating EVENTS and multiple observations in the Composition. The example is here: https://github.com/bjornna/openehr-conformance/blob/master/aql/case1.1-permutation_bp/index.adoc.

I have created a pull request (https://github.com/serefarikan/aql-discussion/pull/1) and issues (https://github.com/serefarikan/aql-discussion/issues/2) to your AQL discussion repo for this.

SELECT
o/data[at0001]/origin/value as Origin,
o/data[at0001]/events[at0006]/time/value as EventTime,
o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude as Systolic,
o/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude as Diastolic,
o/protocol[at0011]/items[at0013]/value/value as Cuff
FROM Composition c
CONTAINS OBSERVATION o[openEHR-EHR-OBSERVATION.blood_pressure.v1]

I think the blood pressure example as describe will work correctly (give the expected resultset) given that the AQL server treats the paths as trees.

Glasgow Comas Scale example

The Glasgow Coma Scale is added because the Comment element is repeatable. This opens a new problem: How to handle repeating comments in the result set. Take a look at this example for my opinion on this: https://github.com/bjornna/openehr-conformance/blob/master/aql/case1.1-permutation_gcs/index.adoc

SELECT
o/data[at0001]/origin/value as Origin,
o/data[at0001]/events[at0002]/time/value as EventTime,
o/data[at0001]/events[at0002]/data[at0003]/items[at0009]/value/value as Eye,
o/data[at0001]/events[at0002]/data[at0003]/items[at0007]/value/value as Verbal,
o/data[at0001]/events[at0002]/data[at0003]/items[at0008]/value/value as Motor,
o/data[at0001]/events[at0002]/data[at0003]/items[at0026]/value/magnitude as Score,
o/data[at0001]/events[at0002]/data[at0003]/items[at0037]/value/value as Comment
FROM COMPOSITION c
CONTAINS OBSERVATION o[openEHR-EHR-OBSERVATION.glasgow_coma_scale.v1]

Screenshot from 2018-02-15 11-56-54

From: Seref Arikan
Date: Wed, May 3, 2017 at 9:55 AM
Subject: Re: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

Hi Bjørn,

Just a quick reply regarding a potential misunderstanding, my simplifications in the ontology/xml/queries are there because I don’t have the time to fully implement RM or query semantics. When I take shortcuts and omit elements either in the content I create or in the queries, whether they use sparql/xquery etc, it is only because of time limitations. I’d suggest you don’t take these anything else i.e I’m not suggesting changes to granularity of RM or aql, though shortcuts for the latter would be a good topic for discussion 🙂 (but that’d be syntax, not semantics which is what I’m trying to point at)

I’m up to my eyeballs in code at the moment so I’ll have to look at the rest of your response a bit later.

All the best
Seref

From: Bjørn Næss
Date: Wed, May 3, 2017 at 10:06 AM
Subject: RE: AQL: Expected result with repeating structures
To: For openEHR implementation discussions

Seref – quick response to your quick response J

I totally understand that you did “shortcuts”. That is perfectly fine and needed to communicate the essence of some examples. But those examples will never be “total openEHR applications” – that takes too much time.

This dialogue is needed. We need a way to define the semantic rules on how to interpret AQLs. Since AQL has borrowed syntax from other technologies like SQL, XPATH, XQUERY, etc. it is really a nice contribution to add some examples using this technologies.

What would be interesting was if other vendors used my examples with the given Compositions and AQL’s to see what kind of result they get, and then agree/disagree with the expected results provided.

My postulate on the granularity was expressed because I from your response found them correct. And I wrote them out to get some feedback on them. This could be added to the specification if other find them useful.

From: Seref Arikan
Date: Sat, May 6, 2017 at 1:40 PM
Subject: Re: AQL: Expected result with repeating structures
To: For openEHR implementation discussions
Hello Bjørn,

This is your first bp query, following the same notation I used in the initial response:

unnamed5

Which would correspond to the following Xquery code:
xquery version "3.0";
declare namespace xsi="http://www.w3.org/2001/XMLSchema-instance";
declare default element namespace "http://schemas.openehr.org/v1";
{
for $obs in composition//content[@xsi:type='OBSERVATION']
for $protocol in $obs/protocol[@archetype_node_id='at0011']
let $cuff := $protocol/items[@archetype_node_id='at0013']/value/value
let $obs_data := $obs/data[@archetype_node_id='at0001']
let $origin := $obs_data/origin/value
for $event in $obs_data/events[@archetype_node_id='at0006']
let $eventTime := $event/time/value
for $data in $event/data[@archetype_node_id='at0003']
for $systolicElement in $data/items[@archetype_node_id='at0004']
let $systolic := $systolicElement/value/magnitude
for $diastolicElement in $data/items[@archetype_node_id='at0005']
let $diastolic := $diastolicElement/value/magnitude
return
{
(element Origin {$origin/text()},
(element EventTime {$eventTime/text()}),
(element Systolic {$systolic/text()}),
(element Diastolic {$diastolic/text()}),
(element Cuff {$cuff/text()}))
}
}

Which gives the result the query is asking for. I am not sure I understand why you’re naming the first aql query for blood pressure as naive though. It looks like a reasonable query to me.

Your second query is the following:
unnamed6

Which corresponds to the following Xquery code:
xquery version "3.0";
declare namespace xsi="http://www.w3.org/2001/XMLSchema-instance";
declare default element namespace "http://schemas.openehr.org/v1";
{
for $obs in composition//content[@xsi:type='OBSERVATION']
for $protocol in $obs/protocol[@archetype_node_id='at0011']
let $cuff := $protocol/items[@archetype_node_id='at0013']/value/value
let $obs_data := $obs/data[@archetype_node_id='at0001']
let $origin := $obs_data/origin/value
(:THIS IS WHERE AQL_BP1 AND THIS FILE DIFFERS:)
for $event in $obs//*[@xsi:type='POINT_EVENT']
let $eventTime := $event/time/value
for $data in $event/data[@archetype_node_id='at0003']
for $systolicElement in $data/items[@archetype_node_id='at0004']
let $systolic := $systolicElement/value/magnitude
for $diastolicElement in $data/items[@archetype_node_id='at0005']
let $diastolic := $diastolicElement/value/magnitude
return
{
(element Origin {$origin/text()},
(element EventTime {$eventTime/text()}),
(element Systolic {$systolic/text()}),
(element Diastolic {$diastolic/text()}),
(element Cuff {$cuff/text()}))
}
}

The results for both query are the same, which is:

2017-05-02T20:00:00+02:00
2017-05-02T20:05:00+02:00
100
90
Adult thigh

2017-05-02T20:00:00+02:00
2017-05-02T20:10:00+02:00
101
91
Adult thigh

2017-05-02T20:15:00+02:00
2017-05-02T20:20:00+02:00
102
92
Large adult

 

I did not have time to write the data instance in OWL based on the xml you’ve provided, so I did not write SPARQL versions.

Your third example re Glasgow Comma Scale produces the correct results and you have a good point re displaying this to a clinician: it is almost certain that a clinician would have difficulty figuring the difference between the two rows which is at the rightmost column, but this is not about AQL semantics, it is about how to format query results.

Going back to the difference between your two queries for blood pressure. The results are the same because even though the structural constraints are different, given your data, and more importantly, the design of RM, the results would be the same. Think about it, there is nothing of importance sitting between the observation and event, so whether your query includes the event via a CONTAINS constraint in the FROM clause or via a direct parent/child path in the SELECT clause does not matter

Your use of POINT_EVENT raises the kind of issue I’m pointing at though, because that is not a structural constraint, it is a type constraint. So the interesting question is:

Should AQL implementations support polymorphic results?

That is, if I use EVENT in the FROM clause for a named node as in EVENT e should this resolve to both POINT_EVENT and INTERVAL_EVENT instances? What if the data has INTERVAL_EVENTS and the query, as you’ve done, is asking for POINT_EVENTs?

How about the interpretation of paths int the SELECT clause? Do we assume a logical or here? That is, if a composition does not contains one of the rows in the SELECT clause, should the results include a row with that column set to null, or should that row be excluded all together (if we assume logical AND) ?
Can you imagine what would happen if a large scale data extraction for decision support/population query analysis were run on two different implementations that interpret the SELECT clause children differently?

Currently, almost all vendors that I know if assume a logical OR, so any data instance that satisfies at least one condition/existence in the SELECT clause is included in the results, but this is just common sense of the implementers, there is nothing in the AQL specification about this.

How about the semantics of CONTAINS constraint in the FROM clause? Is it mandatory? can we have optional CONTAINS or any data that fails all CONTAINS constraints should be excluded?

What is the right way to extend AQL with functions that can be allowed to take variables as parameters? This is more of a syntax issue but it requires an underlying semantics to be defined nonetheless.

You can I could exchange examples an implementations till we turn blue but until we have an AQL spec that clarifies the points I’m trying to make above, we’re all blind man in a room, with an elephant in between us 🙂

All the best
Seref

Ps: I’ve accepted the pull request and put the above queries there.

Advertisements

is openEHR hard?

Recently, I found myself in more than one discussion during which I was trying to explain what openEHR is to someone.
It is common to adopt a different explanation of key concepts based on the occupation of the audience. The modelling side of things matter most to clinicians and policy makers and we talk in different terms than a conversation between software developers, architects etc.

The openEHR mail lists also reflect this natural distinction; there is technical, modelling, implementers etc.. I think I’ve realised something though, we end up having technical conversations with clinicians and implementation discussions with software developers. There is nothing wrong with it of course but I think the openness of the standard (it is in the name after all) is causing some problems in its adoption. This post is meant to express my thoughts about this pattern and it may or may not help you when you’re trying to understand some aspects of openEHR.

Since openEHR web site provides access to specifications and there are free modeling tools available to everyone, the introduction to openEHR has no barriers. Great, exactly what is needed. You can also access the discussion lists where implementation and technical issues are discussed. The implicit idea is that people should access resources that would be helpful and relevant for them.

Since openEHR focuses very strongly on giving the clinician control of what clinical information is going to be represented and processed, most people initially focus on model development, core concepts such as archetypes, templates etc.. Thanks to freely available tools such as the archetype editor and template designer, along with the models in the public CKM instance it is possible to get a good understanding in an acceptable amount of time which depends on your familiarity with the core EHR and information system concepts.

It gets harder after that. The natural flow of learning is to start with the models, archetypes etc and then move on to using these models for actual software development. This is where people begin to cross a threshold that marks harder territory ahead. There is still nothing wrong with this of course, you’re free to work on anything you want.

However, even if you’re a software developer who is tasked with developing a piece of software that uses openEHR to some extend, you may still be trying to get your head around something that you may not necessarily need to deal with. The single most important fact you should acknowledge is the following:

openEHR implementation is a large piece of work, it is hard and it is complicated.

This does not mean that openEHR is hard and complicated. There is such an important difference between the first and the second statements, the latter being a blanket statement that covers everything related to openEHR.

The key difference between the two statements is implementation. If you’re implementing openEHR, you’re looking at a real challenge. It is the challenge that exists in every project that aims to make complicated things easy and accessible. openEHR is designed to deliver computable health by letting clinicians define concepts and the rest should happen in a streamlined fashion. It does not end there though, the software developer is also supposed to be able to build the software that is based on the concepts defined by clinician in the same manner; it should be easy. Making this happen is hard, really hard.

It is the reason there are openEHR vendors out there. The easier you want to make something complicated, the harder your job is. Someone has to build a conceptual vault of some sort that will lock the complexity in. As much as I hate Apple analogies, I can’t avoid them sometimes: it is just like the iPhone and iPad line of products. Swiping with your finger is simply amazing in terms of interaction with a computer. No mouse, no keyboard, the screen is all you need. So simple. That simplicity has enourmous research and technology behind it that has cost Apple a lot of money. Apple does not put out its design specs for its products online, they don’t even give you a user guide. The simplicity and design around it is so good, you’re just supposed to be able to use it and it works as intended. Very impressive.

You don’t find users speculating about the way Apple’s cpu in iPhone works, you don’t even see most developers of applications (apps as we call them now) speculate about inner details of the platform (some do but they’re a lot less than the group who simply uses the platform and tools as provided by Apple)

openEHR does not do the same. It lets everyone access all that is available. There is a distinction, an attempt to address different groups of people, clinicians, developers, implementers but no barriers. The result is, people end up confused. The answer to “how do I use this?” becomes unclear. Especially software developers begin to think about actual platform implementation and the inevitable conclusion arises: “openEHR is hard” It would indeed be hard if you try to do it all.

It would be hard to use SQL if you had to implement a relational database just to use SQL. The archetype query language (AQL) is easy to learn, implementing a server that supports it is hard. Easy to use technology requires other people doing complicated things. Complexity acts like energy sometimes, you can transform it but you can’t simply make it disappear.

This is what openEHR vendors do. This is what Ocean Informatics has spend millions of pounds/dollars/yens/liras for (choose your currency as you see fit). Vendors, openEHR implementers give clinicians and software developers tools to make things easy for them. Developing those tools, those platforms is hard work, it is complicated, it takes people with skills and a lot of effort.

If you’re finding openEHR hard, please ask yourself the following question: “what aspect of this really large ecosystem I’m finding hard?”. Is it something for which someone already has an offering? Are you really sure that the complicated task belongs to you?

When I make these points, I usually get the looks that I gave to people at times when they tell me that I should not worry about the inner workings of something and use what is available. I don’t always like it, I want to understand it completely and maybe build my own solution for the problem I’m trying to solve. This is OK, it is OK for you to think like me, but acknowledging that building something is hard does not always mean it is something you have to do to use it. With openEHR, you’re free to do it, but it will costs you.

You’re absolutely free to step outside of your organisational roles, we all have one of those. Go ahead and build a whole stack of software if you want to but don’t underestimate the task ahead. If it is hard, well, there is a reason for that as I’ve written above.

If you’re finding something hard and there is nothing there that makes it easy, no offering from anyone, that means we have something to discuss; a point which we can improve to make things better in an openEHR ecosystem.

So next time you find yourself resorting to blanket statements such as openEHR is hard, please consider the possibility that you may be reaching into somebody else’s speciality. You’re free to do so, but you’ll pay the same price that they’ve paid.

Why reference models matter in healthcare IT?

Recently, I heard a question targeted at my colleagues at Ocean, something in the lines of “what do you think is your greatest accomplishment ?”

Ocean’s software stack has a lot of impressive components, the template designer with its TDO support, the back end repository, Tom’s Eiffel work that is called the Archetype workbench, and clinical knowledge manager are all polished pieces of software with a lot of work behind them (and don’t forget the archetype designer)

However, my pick would not be one of these if I were to choose greatest thing out there. I’d choose the RM, the reference model of openEHR. Why? Well, it is because I have a past that has lots and lots of software development. The vendor that is trying to deliver a product to the clinician stands between the clinician an the technical infrastructure, code and hardware. If you work with all the common approaches and tools of recent times, that is, if you’re doing Object Oriented development with a mainstream language such as C# or Java, it is highly likely that you’ll go through the well known lifecycle of an application. Even if you use agile methods and good practices such as test driven development, there is one fundamental thing in this business you have to get right, if you want to succeed: requirements.

What helps with the requirements?

As scary as it may be, there is not a lot out there that may help with the requirements. In fact, I could argue that one of the reasons we have agile methodologies (and they are liked so much) is that we are expected to get the requirements wrong as software developers! Release early, release often, because most of the time you will release wrong stuff!

Tooling may help you get what you understand as requirements to code faster. A good UML tool that can generate code from models, or convention over configuration approach of frameworks such as Ruby on Rail will certainly help you deliver something fast, but none of these tools will help you understand the requirements better. They’ll just help you avoid mistakes during some key tasks you’re supposed to perform no matter what your software is supposed to do.

Sounds good, right? No matter what you’re supposed to do, these tools help you do it better. That is where the catch is: they won’t make it easier for you to figure out what you’re supposed to do. In order to that, you’d need tools that improve your communication with users and your understanding of the domain, tools that improve your communication with computers would not help here.

This is exactly what RM does, and this is why it is so important. If you’re a clinician, and you want to explain what you want, RM based modelling tools help you express your requirements, without knowing anything about computers, and if you’re a developer, RM based code and frameworks help you figure out how to represent those requirements in code. Someone else did all those rounds of asking the requirements, identifying patterns, and turning them into computable concepts. They’ve done the mistakes, paid the price, and shared what they’ve produced with you. That is one of the most valuable things you can get in this business.

For example, a software developer has a very different view of the concept of quantity, compared to a clinician. The developer has to figure out that quantities in clinical data come with units, magnitudes, reference ranges, and learning this takes time, and costs a lot of money for the software vendors. RM provides time tested components that can help clinicians, developers and computers communicate. Remember: doing it in the right way, and doing the right thing are different. You need better human to human communication for the latter, and if you have something like RM that is accessible to both developers and clinicians, that is your key to productivity.

When comparing RMs people usually forget that RM is not only about clinician to developer communication, once the developer gets his/her hands on the clinical model based on the RM, RM is supposed to improve communication to computers just as well as it has improved communication to clinicians. This is where most efforts fail. If you design your reference model with a very strong focus on human to human communication, diminishing the computability of RM to the point of computer readable (read: XML), you’re going to miss many advantages of an RM that can talk to computer languages better than yours. There are factors such as the granularity of data types, the level of abstractness of top level concepts, and the extend to which object oriented concepts are used. These factors determine how successful your RM will fit into programming language concepts, in other words, languages of computers, also spoken by developers.

openEHR RM has a good balance that lets it sit at a good point between developers (read: machine language speakers) and clinicians. You get too close to developers on that imaginary line, the clinicians lose their ability to express their requirements. You get too close to clinicians, the computers can’t understand what they’re being told. You have to have an RM that sits on the sweet spot.

Every other tool that sits on the RM can be written for other RMs, code generation, persistence, clinical modelling tools, you name it. The one that will succeed is the one that is build on the right RM. This, my friends, is why RM is the most precious asset of openEHR. It has been tested by lots of clinicians and developers, and it has been implemented in many computer languages. Most criticism comes from other methods that are either closer to developers, or to developers.

I am not saying that openEHR RM is the best it can be, because I am not qualified to make these kinds of big statements, but personally I have not seen any other alternative that performs as well. Please give this approach some thought, and think where your approach is putting you on the imaginary line I’ve defined. If you think you’re at a better position than what openRM provides, please let me know!

Harvard study says: “Computers don’t save money in hospitals”.

Ok, this is a paper that should provoke a huge discussion. This paper with two of its authors from Harvard says that the picture in hospitals with computers is quite different than the one we always thought we would see.

Obviously one should read the paper before discussing it, and I  did so. First of all, I have to say that the paper seems to give little thought into why software does not seem to decrease costs. There are three potential reasons mentioned in the conclusion part of the paper, but the final one is quite interesting. Quoting from the paper: “Finally, we believe that the computer’s potential to im-
prove efficiency is unrealized because the commercial marketplace does not favor optimal products. Coding and other eimbursement-driven documentation might take precedence over efficiency and the encouragement of clinical arsimony

Yes! The marketplace does not let us push out better technologies easily. You’d think that once you have a better solution for a problem, the world would give you a warm hug and thank you for your work. The reality of the marketplace is cruel: there is huge politics and conspiracy around healthcare informatics, and working towards better solutions is not enough on its own. It is such a pitty that there is a huge amount of people trying to make things better, and the lack of desired outputs is not only related to capacity of the solutions we are building.

I’d like to see some honest discussion about this paper, and please let me know if you come accross any riplle effects regarding this paper.

What I’ve read, a summary for the fellow geek

Ok, slightly off topic, but if you are interested in my reading list for the last couple of months, here is a brief summary.

Atul Gawande, “Better “. Professor Ingram gave this book to me. If you want to see how doctors see certain things, and how hard it is to perform some tasks which they are expected to perform without any errors, read this book. Gawande discusses some interesting topics, including ethics, with quite unusual examples. Would you become a lawyer after years of being a medical doctor, and sue your own colleagues for malpractice cases? Could you use your medical knowledge to end someone’s life, for an execution? A great read.

Stephen King, “Cell”.  King is not the King I’ve admired for so long anymore. He has his style, he never loses it, you get the same feeling everytime you read his work, but Cell made me feel that I am reading a recycled version of his creativity in the past. You’ll find many common points with this book and his previous works. I do not want to believe that he is done with his universe, after finishing the Dark Tower, but I am failing to enjoy his recent works.

Vernor Vinge, “A fire upon the deep”.  My first encounter with Vinge, and  I think this is a good book. Vinge reminded me of Asimov in many ways, and he manages to build a different type of society which is real enough to keep you in the story. A couple of interesting ideas about the universe, including the slow zone, allows him to explore the outcomes of a partitioned universe. I have found some important parts of the book to refer to Gibson’s Neuromancer trilogy, but it is hard to avoid him when you’re writing about AI.

Neal Stephenson, “Snowcrash”.  This is the book that come closest to Gibson’s world in Neuromancer, among the others listed in this post. It almost gives that feeling I get when I read Gibson, but the main story did not create a powerful impact on me. Still, a good work of cyberpunk. I’d like to get my hands on this kind of books more, but I’d like to see a little bit darker material.

David Mitchell, “Cloud Atlas”.  A serious demonstration of talent. Can’t say the genre, because Mitchell shows that he can write four or five genres in the same book! Tom gave this one to me as a present, and it is one of the most interesting works of fiction I’ve read in the last couple of years. It made me realize that I need to go back to non-science fiction more often.

I am now reading  The Graveyard Book from Neil Gaiman, but I have to say that I want him to focus more on adults’ stories. His genious in Sandman and American Gods shows that he can be very impressive when he constructs complex stories, but all his other works I’ve read after American Gods are a little bit too simple (maybe flat is a better word here). Anansi boys was good, but I want something in the lines of American Gods. I’ll always follow him, but he seems to be a little bit too much into writing books for young people recently.

FPS games and motion sickness

It used to happen to me in the past. After playing for about 4 or 5 hours, and slightly. After almost 10 years of not playing fps games, I bought myself a copy of Half Life 2, and it hit me like a truck!

I can’t believe the strength of the nausea I experience after only 10 minutes of play. I wonder if it is specific to Half Life 2, or to my XPS’s monitor etc. The problem is trying to figure this out is expensive both in money and health terms. I guess I am really getting older.  I have to lie down now, before I decorate the keyboard in a very unpleasant way!

Why on earth we don’t have open source proper terminology servers?

The competition amont different information models in healthcare will never end. Yes, I know that there are many out there who think that a particular piece of work is so much better than the rest, and it is the feature of healthcare informatics. Sorry, I don’t agree. There are many other reasons, which I’d like to outline in another post, but in general, I can’t see this competition going away in the future.

What is interesting is, use of terminologies is common in many information model standards, whether it be HL7, EN 13606 or openEHR. There are many open source tools for many aspects of healthcare informatics, but when it comes to terminology management, the choices are surprisingly few! Other than NCI’s LexGrid initiative and Apelon, I can’t see any serious terminology server work in the open source domain. These two have their own pros and cons, but in general, this sub domain is surprisingly deserted. Please know that I’m not considering projects which were updated 3 years ago for the last time as candidates for my work in Opereffa.

There is huge work around the concepts which will eventually get linked to terminologies, but there is not much effort in the terminology server area. Yes there are many browsers out there, but whatever you do in the modelling phase, you’ll have to have access to a proper terminology server during use of that model (be it a Snomed CT subset or an HL7 message with Snomed CT codes in it). So why I can’t see an interest in this? Is it because people are so focused at well known problems, that they do not bother to think about what lies beyond them? Did open source healthcare attack the problem of informatin model based solutions first, omitting terminology based solutions? Terminology based approaches are old, and they are well established, so I can’t explain the lack of open source decent projects in this field. If you know one, drop me a line, and I’ll buy you a beer/wine/{insert your favorite drink here}.