Pellet and Issues with Datatype Properties

The previous post showed an OWL sample that demonstrated some simple reasoning (classification) using property values.  It defines the following:

  • A class Patient which defines a person in, say, a hospital emergency room.
  • The current body temperature of a Patient can be defined in one of two ways:
    • Using the hasBodyTemperature datatype property, where the temperature value is represented as an integer value (degrees F)
    • Using the hasNamedBodyTemperature object property, where the temperature is given a “code” value (points to a NamedBodyTemperature instance). (NamedBodyTemperature acts as an “enumerated type”, though it may not be formally defined correctly.)
  • There are two classes which categorize the Patient cases:
    • EmergencyCase - This defines the high priority cases, where body temperature is very low or very high.  It does this using both an expression for the object properties and one for the datatype properties.  In Protege, using Manchester syntax, this looks like: “Patient and (hasNamedBodyTemperature value HighFever) or (hasNamedBodyTemperature value Hypothermic) or (hasBodyTemperature only int[< "90"^^integer])”
    • SecondaryCase - This defines a class for the patients who have normal temperature.  In Manchester syntax, this becomes: “Patient
      and (hasNamedBodyTemperature value Fever)
      or (hasNamedBodyTemperature value Normal)
      or (hasBodyTemperature value “100″^^integer)
      or (hasBodyTemperature value “96″^^integer)
      or (hasBodyTemperature value “97″^^integer)
      or (hasBodyTemperature value “98″^^integer)
      or (hasBodyTemperature value “99″^^integer)
  • Finally, there are a group of instances (patients) which have temperatures set in one of the two methods above. Let’s see what happens to them in Pellet 2.0.0RC5:
    • Patient1 - “hasNamedBodyTemperature Normal” This classifies under SecondaryCase because it matches the named object property restriction (for Normal). This is to be expected.
    • Patient2 - “hasNamedBodyTemperature HighFever” This classifies as EmergencyCase, again because it matches one of the object property restrictions in that class.  All very good.
    • Patient3 - “hasBodyTemperature 99″  This should be a normal patient.
    • Patient4 - “hasBodyTemperature 107″ This is a high-fever patient.
    • Patient5 - “hasBodyTemperature 50″ This patient is a hypothermic.

Ok, let’s see what Pellet does with these.  Since datatypes are improving, the results with both version 1.5 and 2.0.0RC5.  The results are important if you are working with data type properties, and for most engineering, this is very common.

First, since Pellet has had good support for matching object properties, the results for both versions are the same in those cases:

  • Patient1 - This classifies under SecondaryCase because it matches the named object property restriction (for Normal). This is to be expected.
  • Patient2 - “hasNamedBodyTemperature HighFever” This classifies as EmergencyCase, again because it matches one of the object property restrictions in that class.  All very good.

Next, the data types.  Under 1.5, which had limited support for datatype restrictions and rules:

  • Patient3 - Data type property matching (equality) seems to work fine, so this normal case matches up with the “or (hasBodyTemperature value “99″^^integer)” in SecondaryCase and classifies correctly.
  • Patient4 - Since neither class covers this value, this patient should not classify under either case, and it doesn’t (which is OK).
  • Patient5 - This would have engaged the “hasBodyTemperature only int[< "90"^^integer]” but version 1.5 was known for issues with data type properties and it simply ignores this clause.

Under 2.0.0RC5, the presence of the “hasBodyTemperature only int[< "90"^^integer]” clause in EmergencyCase causes an error in the reasoner, at least in the Protege plug-in reasoner for 2.0. The error is:

org.mindswap.pellet.exceptions.InternalReasonerException: Unknown term type: not(restrictedDatatype(http://www.w3.org2001/XMLSchema#int,[facet(http://www.w3.org/2001/XMLSchema#maxExclusive,literal(90,(),http://www.w3.org/2001/XMLSchema#integer))]))

This does not happen in a stand-alone program (at least as far as I can see) however, it still does not classify Patient5. The results are effectively the same as with version 1.5.x.

UPDATE (2009-06-30): The problem with Patient 5 under 2.0.0RC5 had to do with the restriction type.  A solution that works is in the next blog in the series.

Dates are another common type that needs ranges, and hopefully, a future test will deal with these.

Getting Started with Ontologies

There is a lot of good information on a lot of topics these days, so I don’t have to go into:

If you are publishing information to the Semantic Web (or a corporate semantic web), you need to be using OWL.  From there it gets conditional.  Use a dialect and version that is supported by the inference engine you will be using.  Owl 1.x is a good choice.  Not all engines, even the well-supported ones, support all dialects, so your best bet is the original RDF/XML dialect,  and not all engines support the entire standard, so be prepared to limit what you do.  Getting information on this is difficult at best (more later).

You need to be able to create and edit the ontology.  While it is educational to do it by hand with a text editor at the beginning, the tools are actually getting good enough to trust at this point.  The popular editor in most circles is Protege.

I recently downloaded and used the Protege 4 Beta (OWL edition).  Unlike the earlier versions, Protege 4 now comes with two inference engines installed: Pellet 1.5 (slower but more complete) as well as FACT++ (faster). Pellet 2 can also be loaded as a plug-in directly.  Having plug-ins solves a lot of issues over the earlier versions. For learning Protege, there is a great tutorial from the University of Manchester. Read this first.

Of course, Protege (and the inference engines) still have issues and you will probably not see them coming, so do the usual thing – save early, save often. Another thing to keep in mind is to do frequent “classifications” since logic errors will also creep up on you and tend to get really confusing when they start getting mixed up with other errors.  Doing frequent classifications helps find these early – in short: test early, test often.  It starts to sound a lot like software development, no?

In the next post, more about Pellet and how it will affect your ontology development.

Follow

Get every new post delivered to your Inbox.