Pellet and Datatype Ranges

In an earlier entry, there was a sample OWL file that did not classify properly.  Some of the folks on the Pellet users mailing list helped me out and gave me a fix.  Hopefully I will get this explanation right.

Patient 5 was not classifying properly because the restriction I was using was “hasBodyTemperature only int[< 90]“. (This is Manchester syntax, of course, generated by Protege. The RDF/XML sample source defines the clause using xsd:maxExclusive.)  A sample of this syntax can be found in the Teenager class definition in the Owl Primer Draft (search for “Teenager”), except it uses “some”, not “only” in the restriction. This is part of the problem I had.

By specifying “only”, the patient could only be an Emergency case if they only had data type values less than 90 (and no cases to the contrary).  However, with an open world, there was no guarantee that even if I stated a value of 50, that there might not be other values that were above 90.  As such, with “only” the reasoner could not decide that the Patient would fall into that case.

The solution was to change the data type property to be “functional” (in RDF/XML, hasBodyTemperature becomes defined as type “FunctionalProperty”) so that there could be only one value for the patient.  If the engine knows that the value on the patient is definitive, then it can classify. The updated sample is here.

The alternative (which also works) is to use “some” (or owl:someValuesFrom in RDF/XML) instead of “only” (owl:allValuesFrom in RDF/XML) without changing the property to functional. However, this goes against the meaning of the sample.

This is not the first time the Open World has caught me off guard.  I am too used to Closed World programming, so I expect it will not be the last.

Pellet and Issues with Datatype Properties

The previous post showed an OWL sample that demonstrated some simple reasoning (classification) using property values.  It defines the following:

  • A class Patient which defines a person in, say, a hospital emergency room.
  • The current body temperature of a Patient can be defined in one of two ways:
    • Using the hasBodyTemperature datatype property, where the temperature value is represented as an integer value (degrees F)
    • Using the hasNamedBodyTemperature object property, where the temperature is given a “code” value (points to a NamedBodyTemperature instance). (NamedBodyTemperature acts as an “enumerated type”, though it may not be formally defined correctly.)
  • There are two classes which categorize the Patient cases:
    • EmergencyCase - This defines the high priority cases, where body temperature is very low or very high.  It does this using both an expression for the object properties and one for the datatype properties.  In Protege, using Manchester syntax, this looks like: “Patient and (hasNamedBodyTemperature value HighFever) or (hasNamedBodyTemperature value Hypothermic) or (hasBodyTemperature only int[< "90"^^integer])”
    • SecondaryCase - This defines a class for the patients who have normal temperature.  In Manchester syntax, this becomes: “Patient
      and (hasNamedBodyTemperature value Fever)
      or (hasNamedBodyTemperature value Normal)
      or (hasBodyTemperature value “100″^^integer)
      or (hasBodyTemperature value “96″^^integer)
      or (hasBodyTemperature value “97″^^integer)
      or (hasBodyTemperature value “98″^^integer)
      or (hasBodyTemperature value “99″^^integer)
  • Finally, there are a group of instances (patients) which have temperatures set in one of the two methods above. Let’s see what happens to them in Pellet 2.0.0RC5:
    • Patient1 - “hasNamedBodyTemperature Normal” This classifies under SecondaryCase because it matches the named object property restriction (for Normal). This is to be expected.
    • Patient2 - “hasNamedBodyTemperature HighFever” This classifies as EmergencyCase, again because it matches one of the object property restrictions in that class.  All very good.
    • Patient3 - “hasBodyTemperature 99″  This should be a normal patient.
    • Patient4 - “hasBodyTemperature 107″ This is a high-fever patient.
    • Patient5 - “hasBodyTemperature 50″ This patient is a hypothermic.

Ok, let’s see what Pellet does with these.  Since datatypes are improving, the results with both version 1.5 and 2.0.0RC5.  The results are important if you are working with data type properties, and for most engineering, this is very common.

First, since Pellet has had good support for matching object properties, the results for both versions are the same in those cases:

  • Patient1 - This classifies under SecondaryCase because it matches the named object property restriction (for Normal). This is to be expected.
  • Patient2 - “hasNamedBodyTemperature HighFever” This classifies as EmergencyCase, again because it matches one of the object property restrictions in that class.  All very good.

Next, the data types.  Under 1.5, which had limited support for datatype restrictions and rules:

  • Patient3 - Data type property matching (equality) seems to work fine, so this normal case matches up with the “or (hasBodyTemperature value “99″^^integer)” in SecondaryCase and classifies correctly.
  • Patient4 - Since neither class covers this value, this patient should not classify under either case, and it doesn’t (which is OK).
  • Patient5 - This would have engaged the “hasBodyTemperature only int[< "90"^^integer]” but version 1.5 was known for issues with data type properties and it simply ignores this clause.

Under 2.0.0RC5, the presence of the “hasBodyTemperature only int[< "90"^^integer]” clause in EmergencyCase causes an error in the reasoner, at least in the Protege plug-in reasoner for 2.0. The error is:

org.mindswap.pellet.exceptions.InternalReasonerException: Unknown term type: not(restrictedDatatype(http://www.w3.org2001/XMLSchema#int,[facet(http://www.w3.org/2001/XMLSchema#maxExclusive,literal(90,(),http://www.w3.org/2001/XMLSchema#integer))]))

This does not happen in a stand-alone program (at least as far as I can see) however, it still does not classify Patient5. The results are effectively the same as with version 1.5.x.

UPDATE (2009-06-30): The problem with Patient 5 under 2.0.0RC5 had to do with the restriction type.  A solution that works is in the next blog in the series.

Dates are another common type that needs ranges, and hopefully, a future test will deal with these.

Embedding Pellet in an Application

If you are building a JAVA application that needs inference support, here is a quick start note. Pellet is one of the most popular engines currently and has a fairly good support base, making it a good choice for small-scale applications that need an inference engine.  The documentation on their site is better than most, but maybe this will help.

The sample below uses Pellet 2.0.0-RC5 which can be downloaded from this URL on their site.  It contains a number of embedded libraries including JENA 2.5, which is used in the sample code.

To get this running, I extracted the ZIP file and added the JAR files in the lib and lib/jena subdirectories to my class path. After that, a minimal sample to set up JENA to use an embedded Pellet reasoner and run a simple SPARQL query can be found in OwlTest2.java (attached). The sample is fairly straight-forward JENA code, and the only line really needed from Pellet is near the beginning.

This sample runs a classification query against the DatatypeEX2-r.owl file. This file uses both datatype and object restrictions (to compare methods) in order to classify patients with temperatures as emergency or secondary importance. Earlier, the datatype issue with Pellet was discussed, so here, only the object restriction method seems to work.

If you can work with OWL/RDF and can  bring the console messages under control, this gets an application up and running fast.

Over the next couple of months, I will try to get minimal start-up code for a few other engines.

Follow

Get every new post delivered to your Inbox.