Constructing Complex Results with SPARQL

A quick side note:

At several points, I have compared SQL to the W3C language stack and one of the capabilities in SQL that was awkward in the OWL/SWRL combination was the lack of a way to construct complex results (such as blank nodes and new sub-graphs) in the way that SQL can do with its data manipulation language.  Ideally, we want to be able to do all with the new stack that the older relational languages could do and without needing to leave the session (all in the same script).

As I was reviewing SPARQL, the W3C query language, I came across the CONSTRUCT query type.  This allows a query to create a new graph (group of triples) constructed from the data in the triples.  I am starting to look at this as a possible way to get around the issue.  It appears that if OWL/SWRL takes care of making the logical inferences, the facts can then be gathered and re-formed into a new “shape” using SPARQL CONSTRUCT queries.  The problem is that while OWL/SWRL live in the session, SPARQL really lives outside, in the sense that the SPARQL query is initiated externally to pull information from the knowledge base or working memory.  If this knowledge was needed in the session, it would need to be pulled, then reinserted by some external process.

Since the topic right now needs to do work with existing ontologies, new object construction will be an issue. I need to be able to take data in one OWL schema and construct equivalent objects in types of a new ontology.  One project that will probably come up is converting existing data to an upper ontology (such as SUMO) and this is sure to come up.

As with any part of the stack, having it work on a given platform is always in question.  It will need to be tested. To do the project above, I am creating some convenience modules in JAVA to let me construct an ETL process for OWL data (much like the loader components I would use with a SQL database or XML with ESB components).

Back to Ontology again.

Performance Charts for File Example

In an earlier post, I noted that the approch to the use of Pellet would need to be changed for the Files rule engine.  The file approach was not working well for the volume of data in the files, so a database was engaged, and I noted that a more efficient approach would be to arrange to do only one small batch of inferences at a time, in this case, processing a single file.  Of course, numbers are better.

So, the application was set up with timers and memory usage logging and three approaches were used:

  • Files1 – The original file approach, where the ontology and rules were loaded into memory from RDF files, the file data was read in and added to RDF in memory, all results were inferred and the results were extracted.
  • DB1 – The first database approach was to engage a Derby database (easy, in-process, portable and file-based) and a JENA ModelRDB model as storage, then the data was loaded into memory as before.  However, the files were simply loaded as in the first approach, then stored in the DB model.  This is intended to test whether the Pellet reasoner is doing any smart background saves while processing.
  • DB2 – The second database approach uses the same DB Model for storage, but after the ontology and rules were loaded, the files and directories were loaded one at a time and for each, the inferences were done and stored to the DB model.  A commit was done at the end of each batch to flush to the DB so the transactions would not get too large.  This limits the amount of inference needed at each step, since all the previous inferences are already available as facts in the DB model.  If Pellet takes advantage of this, the amount of memory usage should drop drastically.

For each run, a set number of files and directories were read.  The number of files is not an exact measure, since on an average, files and directories use different amounts of data, but overall, the average should show a reliable trend.  Likewise, the measure should have been in triples, but this also should not affect the trend much.  At the end of each run, the database was dumped so each run started from scratch with an empty database.

The results are shown in the following two charts.  The first is the memory chart, where the vertical axis is the number of Megabytes allocated by the JVM by the end of the run (Runtime.getRuntime().totalMemory() – crude, but indicative) versus the number of files scanned.Pellet-MemoryPerformance

Some notes about this:

  • The first approach (“File1″) simply loaded all the file data, ontology and rules into memory, did the inference and extracted the complete results.  Memory rose quickly and exceeded the limits of the machine very quickly.
  • The second approach (“DB1″) showed a similar result, which is not surprising if Pellet is doing exactly the same thing and really only stores the results at the very end when the model changes are committed.  This approch does not gain anything.
  • The third approach (“DB2″) which commits frequently, reducing the inference results to base facts in the DB model and avoids doing large numbers of inferences shows much better results.  As a matter of fact, it is almost flatline – which is good for applications that need to scale.

Another thing to note (which is not really seen in the chart) is that the file approach starts out at almost zer0 when the input data is small (important for small applications – if anyone actually has that luxury…).  However, both DB approaches show a minimal baseline above zero, which if you have the actual data, works out to about 12 MB.  This means that the DB model itself is initially taking a constant amount of memory.  I am not sure if that would be repeated for each triplestore used, so it is good to keep in mind.

The time plot is shown below for the same data, where the vertical axis is time in seconds for the given number of files to be read.

Pellet-TimePerformance

Some things to note here:

  • The database is deleted between runs, so a minimal overhead is incurred while JENA sets up the triplestore tables in Derby.  That appears to work out to about 14 seconds.
  • The DB1 results are worse overall than the File1 results.  This can be explained by the extra time needed to store the inference results at the end of the run when using the DB1, whereas the File1 approach was able to do a quick file dump.
  • The DB2 times (subtracting the DB initialization) rose much less rapidly than the other two approaches.  As expected, this approach is far more efficient and scalable.  However, it is not a proportional improvement, since Pellet still has to load a lot of data from the store in order to do the inference.

I do not know the internals of Pellet and its usage of the JENA ModelRDB for triplestores.  There are a lot of potential variables even in a simple example like this and results could vary widely from application to application.  However, it does give some indications that can be useful for planning.

HermiT – A New OWL/SWRL Engine

Earlier this year, the first release of the HermiT engine was announced.  This engine has done quite well in the OWL 2 conformance tests (it is neck-in-neck with Pellet!) and with the 1.0 release shows initial support for SWRL. It has been run through the SWRL test suite and passed all the tests except the ones dealing with built-in functions, which is the next area of development. Considering how long the engine has been available, this is looking quite impressive.  I will update the results as new versions become available.

(This last bit is important.  I have been trying a number of engines lately and aside from Pellet and Hermit, I have not found any engines yet that are providing good support for SWRL. I will publish any positive results I get with other engines in future issues. It is irritating to me, though, that given the importance of rules in inference, five YEARS after it became the only widely accepted standard, SWRL support is still limited to a few players.)

Hermit is another Java implementation and it is bundled into a single JAR file, which includes a version of the OWL-API, so it is quite easy to include in projects.  If you need sample code for setting up Hermit, you can get details from the Hermit site above or you can grab a copy of the Hermit test code from the SWRL Test Suite.

Resources used in Files Example

One other note about the file example used earlier.  As the number of asserted individuals rises, the time and memory used by Pellet to do the classification and rules increases, as can be expected.  Given the current case, with 4 SWRL rules and reading the individuals from an RDF/XML file (on a 1 GB laptop), the numbers look like this:

# of Individuals Inference Time (sec) Memory (B)
8 0.375 517,7344
100 2.7655 42,745,856
230 12.406 187,015,168
500 61.015 780,402,688
1000 N/A Out of Memory

Since in this example, it will be difficult to know how many files will be imported in a batch, scalability becomes an important issue.  If the number of files in a batch (for whatever reason) just happens to get near 1000, the inference will fail.

Some common suggestions:

  • Increase Memory – “Memory is cheap” is a very common response when this issue is raised.  It can be a fast fix in a pinch. However, most industry people (system architects, for instance) will shoot this down immediately for a number of reasons.
    • No matter how much memory you throw at a solution, if the input is unbounded, eventually there is a risk that the new limit will be reached (unexpectedly).  Normally this will happen during a demonstration to upper management …
    • “Real” applications (enterprise, commercial) have to share resources in an infrastructure and are expected to behave nicely. Resources like memory are frequently shared with other virtual servers (VM’s) in the same way as disk space is on a SAN, and processor speed is trottled by server.  Even if an application has “full access” to a server of it’s own, when it is deployed to production, there may be new limits on what it can use.
    • While this application is mostly dealing with single-thread batch processing, most rule applications in an infrastructure are dealing with any number of concurrent threads.  If all of those threads have unbounded memory, no amount of memory would be safe.
  • Tune the Engine – In any rule (or knowledge) base, there are features of the engine that can be turned off to conserve resources.  (Try the information in the Pellet FAQ for instance.) Optimization is good in any application, especially if the gains are good. However, no matter how much you tune the engine, if the number of instances coming into the application is unbounded, eventually a spike in the number of input instances will hit the magic limit.
  • Process a Fixed Number of Files – Typically, a rules application will look at a single case at a time and process the results.

It really depends on the application, of course. Research applications (and heavy-AI applications in general) are frequently given more resources than typical enterprise applications.  Tuning and a set limit on input instances is usually possible.

In this case, the chosen approach is to use the third option.  To do this, one approach is to merge the file scanner part of the application into the classification step, load a copy of the file ontology (OWL and SWRL rules) as a base ontology, and for each file or directory found:

  1. Assert the file information (name and so on).
  2. Run classification.
  3. Extract the results and act on them.

This kind of issue pops up frequently, so we will be dealing with it again.

Rule Result Structures

Even in a trivial application like this, some design issues become obvious very soon. In the File Tagging rule base, most of the rules are designed to set File individuals to new classes or add specific properties to them.  All this information needs to be extracted at the end of the classification process by the application in order to do the actual tagging operation.  As new tags and organizations are added, more and more of these tags and classes will need to be maintained.  If the application needs to query more and more different types of result values from the Ontology, the code could need alterations frequently.

For example, in the current setup, the owner rules are asserting hasOwner(?f,”Administration”) and other properties that need to be tracked, while the File sub-classes are also being used and have to be interpreted by the application to get the MIME tags for the files.  Clearly, the application will need to be updated for every new tag that comes along.

It would be better if the structure of the Ontology and rules were not coupled so directly with the application.  In the end, the calling application just wants to know the file tags, where each individual result is a triple of the file ID, the tag name and the tag value for each new tag needed.  It does not need to know the internal details of how they were represented in the Ontology.

The problem with all the rule examples shown so far is that they result in either a Class assertion or a property assertion on an individual found in the body (condition) clause.  It can produce any number of these, but that is all they can create.  In this case, we have 3 values per result, and they need to be grouped in some way.  A number of possibilities pop to mind:

  • Use Blank Nodes for Results – In the consequent (head), create a blank node with a type of Result (making it easy for the calling application to query the results), each with three predicates (one for each value: file, tag name and tag value). This means that an individual of this type is created for each rule “firing” (each result).
    • The first problem is that in a Horn-style rule, the creation of a blank node requires that the rule will have a variable bound only on the consequent-side of the rule.  Try this in Pellet and you get a message like this:
    • WARNING: Ignoring rule [DIVXFile(?c)] => [Result1(?c2), arg1(?c2,"123"^^integer)]: Head atom Result1(?c2) contains variables not found in body.
    • Basically, it is saying that the variables have to be bound in the condition.
    • The other problem is the approach for building a blank node on the consequent side of the rule.  While there is nothing in the spec that says this is not allowed, there is nothing that guarantees that it is to be supported. This means that support for such a feature will be inconsistent between engines.
  • Alternative: DL Restrictions – If it cannot be done with SWRL, can it be done with DL?  This would entail creating a consequence or a non-SWRL complex class definition that would create a blank node as a restriction, then apply the class to the individuals as part of the rule.  However, I have not been able to find a sample that does this with blank nodes, and I do not know if it is even possible. The idea would be to use DL to get past need for using an unbound variable on consequent side.
  • Use Built-in to Create Instance – To push the instance creation to the condition of the rule, use a built-in to create an instance that can be set with the values in the consequence. (It has to be on the condition side to get around the error stated above.
    • The problem is that this will normally create instances whether or not there is a result from the rule, leading to a lot of junk in memory. This could be a performance problem.
    • The other problem is that this requires a built-in that supports the creation of instances.  The SWRL Spec does not provide one.  In Protege’s SWRLTab, there is a custom-designed one, but that will only be of use for ontology work done inside Protege or in engines that can import the function. In most of the work I would be doing, this is not an option. (For more information on SWRL in Protege, see this post.)
  • Use Lists – If a list (with two entries – tag name and tag value) can be set with a hasTag property on the consequent side, then any number of tags can be set (one per rule firing) and retrieved later by looking for the property names.  When the properties are extracted by the calling application, all three values are retrieved correctly and it also allows the application to get all results for the whole rule base with one query which means less maintenance as the rule base grows.
    • There are limitations on the language/model used  (and thus the inference engine).  In the SWRL Spec, section 8.7, where it describes the built-ins used for lists, it says “RDF-style lists can only be used as OWL data in OWL Full”.  I am not sure what the full implications of this statement are.

If the above is true, then the results of rules (and probably DL) inference are limited to alterations of the existing instances and it will be impossible to build more complex constructs.  This will probably be a problem in cases like:

  • Ontology Mappings where the “shape” of the data mappings are not 1-to-1 (class renaming and property reassignments). The moment that an intermediate object needs to be created, the mapping will be in trouble.
  • Rule results that bind more than 3 values per result. If the results need to be grouped into sets, it implies the creation of an object, and this becomes a problem.

If more information comes up regarding this, I will post it in a future entry.

For the file tagging application, the best thing that can be done is to use sub-properties to help generalize the query from the application. The results need to be bound triples of information (file ID, tag name, tag value), so these can be modelled as properties where there is a specific property for each tag name set by the system.  For instance:

  • hasTag - the root property
    • hasOwnerTag - corresponds to the tag name “owner”, sets the name of the owner organization of the file.
    • hasMIMEType - corresponds to the tag name “MIMEType”, sets the MIME type based on the file content format.

By making all the property names sub-properties of some root property, the application can simply query the ontology after inference for all axioms that include properties of the root type (that is, find all axioms with hasTag as a property).  The calling application will then pick through the axioms and convert each one to a tag.  (It will need to keep a mapping of property names to tag name strings, of course.  To get fancy, this mapping could also be kept in the same ontology as static instances and be queried by the caller application as well. That keeps all the information in the same ontology.)

There is more that could be said about this example, but there are more important issues to examine.  In the next few entries, I will look at some basic DL issues.

SWRL Tests

There are a lot of compelling reasons for using OWL/SWRL for rules and knowledge management work.  The two work together so files can be done in a single format, there are several dialects to choose from, many engines have some support for it and it is, at least, a standard. The problem is, can you trust SWRL?  If you are an architect making a recommendation (or a developer just messing around), you really don’t want to invest your time in someone else’s toys. You need to know:

  • Does it work?
    • How much of it works?
    • what is not supported?
  • Is it supported?
    • Is there an active community that can answer questions?
    • Is there documentation (you’re not psychic – well, maybe you are)?
    • Are there working samples?
  • How does it perform?
    • Is it fast enough for your solution?
    • Will it use more resources than the Google farm?

Most open source projects will try to answer these questions as early as they can if they want to survive.

The answers to the first two questions are the most important – you need to know if it works.  That is the job of conformance tests.  Most W3C standards have associated test suites (conformance tests).  The OWL 2 conformance suite is a good example (see previous entry).  However, the SWRL standard does not currently have a conformance suite. If anyone knows of one, please let me know.

In the mean time, I found I had to write some tests.  I needed them to tell me:

  • What syntax is correct?  (The samples in the spec only go so far and only in certain dialects of OWL/SWRL.  This is a learing/documentation thing.)
  • What engines support SWRL and how much of SWRL and which dialects are supported?
  • Some basic speed tests. (Note: I did find some existing benchmarks for various rule engines at the RuleBench project, but they did not use SWRL and the engines were different than the ones I was interested in testing.  Check out their results if you are interested in more advanced performance tests.)

The tests are still in development, but currently, I am testing using Pellet 2 (pellet-2.0.0-rc7).  The tests are in RDF/XML and Turtle dialexts (equivalent rules in both). There are 6 test cases with a total of about 30 test cases covering named class rules, numeric and string datatype property rules, and the use of various numeric and string built-ins.  The current manifest of rules in the tests is as follows (Protege 4 rules format):

  • Test 1 (basic named class rules):
    • Driver(?x) -> Result1(?x)
    • Driver(?x), Person(?x) -> Result2(?x)
  • Test 2 (named classes with object properties):
    • hasProperty1(?c, ?o) -> Result1(?c)
    • hasProperty1(?c, test1) -> Result2(?c)
    • Test1(?c), hasProperty1(?c, ?o) -> Result3(?c)
    • Test1(?c), hasProperty1(?c, test2) -> Result4(?c)
  • Test 3 (rules with numeric datatype properties and numeric built-ins):
    • hasDataValue1(?i, ?v) -> Result1(?i)
    • hasDataValue1(?i, “3″^^integer) -> Result2(?i)
    • hasDataValue1(?i, ?v), greaterThanOrEqual(?v, “0″^^integer) -> Result3(?i)
    • hasDataValue1(?i, ?v), equal(?v, 3) -> Result4(?i)
    • hasDataValue2(?i, ?v), lessThan(?v, “9″^^integer) -> Result5(?i)
    • hasDataValue2(?i, ?v), lessThanOrEqual(?v, “9″^^integer) , greaterThanOrEqual(?v, “7″^^integer) -> Result6(?i)
  • Test 4 (rules with string built-ins – raw constant tests):
    • stringEqualIgnoreCase(“abc”, “abc”) -> Result1(test1a)
    • stringEqualIgnoreCase(“aBc”, “abc”) -> Result1(test1b)
    • stringEqualIgnoreCase(“abc”, “ABC”) -> Result1(test1c)
    • stringEqualIgnoreCase(“abc”, “abd”) -> Result1(test1d)
    • stringConcat(“abc”, “ab”, “c”) -> Result2(test2a)
    • stringConcat(“abc”, “”, “abc”) -> Result2(test2b)
    • stringConcat(“abc”, “a”, “b”, “c”) -> Result2(test2c)
    • contains(“aaaaabcdde”, “abc”) -> Result4(test4a)
    • contains(“aaaaabcdde”, “abc”) -> Result4(test4a)
  • Test 5 (string built-ins with arguments carrying over – alternative syntax):
    • stringArg1(?c, ?arg1) , stringArg2(?c, ?arg2) , contains(?arg1, ?arg2) -> Result1(?c)
    • stringArg1(?c, ?arg1) , stringArg2(?c, ?arg2) , containsIgnoreCase(?arg1, ?arg2) -> Result2(?c)
  • Test 6 (substring built-in):
    • substring(“abc”, “arabca”,0,3) -> Result3(test3a)
    • substring(“abc”, “arabca”, 2, 3) -> Result3(test3b)

These can hardly be called conformance tests. They are mostly driven by the rule patterns I use in my rules development work and I will be developing them over the next month or so as I hit new things that need to be checked. Conformance tests would test for invalid syntax, unsatisfiable conditions and complex OWL un-named classes in the consequents.

If you need samples for any of these patterns in one of the formats I support, let me know.  After they are expanded a bit, I will park them somewhere as a source project.

Results for Pellet 2 so far are good.  Pellet using JENA does not support Turtle (it is probably an issue in JENA with SWRL), but it does support them in both dialects using the Pellet OWL-API interfaces, so I did a test using either interface to make sure both give the same results.  Given that, Pellet 2 passed all tests except 6 – apparently the “substring” built-in is not currently available.

Frankly, Pellet 2 is doing really well in all of the various test suites I have seen. Given all the trouble I have had with engines over the last 3 years or so, I am getting optimistic about this whole business.

Well, what about the other engines?

Rules using SWRL

OWL is normally done using description logic (DL) in which you try to build complex classes to classify results.  The material up to this point points to plenty of references about how to do this.

The other way of doing logic in the W3C stack is with rules.  For a lot of us, especially those who took something like classical logic in school, rules is a lot more natural in form.  Most paradigms seem to boil down to “Horn clauses”, which is a name well known to computer scientists. It is fairly intuitive in form once you learn the form.  For example:

parent(?x,?y),brother(?y,?z) -> uncle(?x,?z)

In the example above, the rule is given in the presentation syntax used in Protege.  The left side is the condition, which is a set of clauses bound by conjunction (AND), the ?x is a variable and the “->” sign is an implication. If the left side can be satisfied in your data (directly or by inference), the right side is asserted. This particular example can be lifted from any number of sources.

In most simple cases, the results are similar. You can use either DL or rules to conditionally assign an individual to a class or to add properties to an individual.  Thus both can be used in a rules-type application.

If you are working with W3C languages like OWL and RDFS, the current practical choice is SWRL (the Semantic Web Rule Language).  It has been around for almost 5 years now and is primarily defined in the W3C SWRL Submission which outlines its history, relation to other languages and usage. It outlines a six examples of usage in OWL/XML and about three in RDF/XML. It is unfortunately one of the few documents on the web that outline the usage of the language, so it is sometimes hard getting started on the language.

Last year I ran some samples through the Pellet and KAON2 engines and each were running a subset successfully.  However, I was having trouble with data type properties in my rules and given the frustration of not having documentation for a lot of the aspects of the language, I had to shelve the project.  However, with the release of Pellet 2,   I pulled the old rules and they are working fine with a bit of tweaking. (Given the documentation issues, it might have been a syntax issue.) In the process, I expanded the samples to cover various string and integer built-in functions.

The samples try to cover as many of the syntactic nuances as possible, but currently only deal with the RDF/XML format.  This is a problem if you try to get samples from the Web for SWRL.  I am finding that if I check the web for a given SWRL keyword, I can get results back in OWL/XML, N3 (I think it was N3) and RDF/XML, or the examples look like modifications to the ones in the submission (above).  The major issues are around arguments (which are lists or collections) and the various nuances of how to code them.  I have resorted to typing rules into Protege and looking at the resulting syntax to find out the proper coding, then doing it by hand in the actual file.  It has been a frustrating week.

In any case, I now have a small set of RDF/XML SWRL samples that seem to work.  I need to expand them out a bit to make sure I have the syntax down and I will be talking about the progress in a future entry.

New Pellet Tutorial

Pellet has some of the best documentation of any OWL reasoner.  Mere days ago, Kendall Clark published a blog entry on the Pellet site containing a slide show from a tutorial session at the Semantic Technology 2009 conference.  The 65 slides contain a pile of useful explanations of basic concepts and samples, including:

  • A brief summary of OWL (current)
  • Definitions of consistency and unsatisfiability, plus when to use them
  • Many JENA API samples
  • Best practices for developing with Pellet
  • Using Integrity Constraints

This is great material and I would highly recommend it. Some of the explanations are the best I have seen yet.

Pellet and Datatype Ranges

In an earlier entry, there was a sample OWL file that did not classify properly.  Some of the folks on the Pellet users mailing list helped me out and gave me a fix.  Hopefully I will get this explanation right.

Patient 5 was not classifying properly because the restriction I was using was “hasBodyTemperature only int[< 90]“. (This is Manchester syntax, of course, generated by Protege. The RDF/XML sample source defines the clause using xsd:maxExclusive.)  A sample of this syntax can be found in the Teenager class definition in the Owl Primer Draft (search for “Teenager”), except it uses “some”, not “only” in the restriction. This is part of the problem I had.

By specifying “only”, the patient could only be an Emergency case if they only had data type values less than 90 (and no cases to the contrary).  However, with an open world, there was no guarantee that even if I stated a value of 50, that there might not be other values that were above 90.  As such, with “only” the reasoner could not decide that the Patient would fall into that case.

The solution was to change the data type property to be “functional” (in RDF/XML, hasBodyTemperature becomes defined as type “FunctionalProperty”) so that there could be only one value for the patient.  If the engine knows that the value on the patient is definitive, then it can classify. The updated sample is here.

The alternative (which also works) is to use “some” (or owl:someValuesFrom in RDF/XML) instead of “only” (owl:allValuesFrom in RDF/XML) without changing the property to functional. However, this goes against the meaning of the sample.

This is not the first time the Open World has caught me off guard.  I am too used to Closed World programming, so I expect it will not be the last.

Pellet and Issues with Datatype Properties

The previous post showed an OWL sample that demonstrated some simple reasoning (classification) using property values.  It defines the following:

  • A class Patient which defines a person in, say, a hospital emergency room.
  • The current body temperature of a Patient can be defined in one of two ways:
    • Using the hasBodyTemperature datatype property, where the temperature value is represented as an integer value (degrees F)
    • Using the hasNamedBodyTemperature object property, where the temperature is given a “code” value (points to a NamedBodyTemperature instance). (NamedBodyTemperature acts as an “enumerated type”, though it may not be formally defined correctly.)
  • There are two classes which categorize the Patient cases:
    • EmergencyCase - This defines the high priority cases, where body temperature is very low or very high.  It does this using both an expression for the object properties and one for the datatype properties.  In Protege, using Manchester syntax, this looks like: “Patient and (hasNamedBodyTemperature value HighFever) or (hasNamedBodyTemperature value Hypothermic) or (hasBodyTemperature only int[< "90"^^integer])”
    • SecondaryCase - This defines a class for the patients who have normal temperature.  In Manchester syntax, this becomes: “Patient
      and (hasNamedBodyTemperature value Fever)
      or (hasNamedBodyTemperature value Normal)
      or (hasBodyTemperature value “100″^^integer)
      or (hasBodyTemperature value “96″^^integer)
      or (hasBodyTemperature value “97″^^integer)
      or (hasBodyTemperature value “98″^^integer)
      or (hasBodyTemperature value “99″^^integer)
  • Finally, there are a group of instances (patients) which have temperatures set in one of the two methods above. Let’s see what happens to them in Pellet 2.0.0RC5:
    • Patient1 - “hasNamedBodyTemperature Normal” This classifies under SecondaryCase because it matches the named object property restriction (for Normal). This is to be expected.
    • Patient2 - “hasNamedBodyTemperature HighFever” This classifies as EmergencyCase, again because it matches one of the object property restrictions in that class.  All very good.
    • Patient3 - “hasBodyTemperature 99″  This should be a normal patient.
    • Patient4 - “hasBodyTemperature 107″ This is a high-fever patient.
    • Patient5 - “hasBodyTemperature 50″ This patient is a hypothermic.

Ok, let’s see what Pellet does with these.  Since datatypes are improving, the results with both version 1.5 and 2.0.0RC5.  The results are important if you are working with data type properties, and for most engineering, this is very common.

First, since Pellet has had good support for matching object properties, the results for both versions are the same in those cases:

  • Patient1 - This classifies under SecondaryCase because it matches the named object property restriction (for Normal). This is to be expected.
  • Patient2 - “hasNamedBodyTemperature HighFever” This classifies as EmergencyCase, again because it matches one of the object property restrictions in that class.  All very good.

Next, the data types.  Under 1.5, which had limited support for datatype restrictions and rules:

  • Patient3 - Data type property matching (equality) seems to work fine, so this normal case matches up with the “or (hasBodyTemperature value “99″^^integer)” in SecondaryCase and classifies correctly.
  • Patient4 - Since neither class covers this value, this patient should not classify under either case, and it doesn’t (which is OK).
  • Patient5 - This would have engaged the “hasBodyTemperature only int[< "90"^^integer]” but version 1.5 was known for issues with data type properties and it simply ignores this clause.

Under 2.0.0RC5, the presence of the “hasBodyTemperature only int[< "90"^^integer]” clause in EmergencyCase causes an error in the reasoner, at least in the Protege plug-in reasoner for 2.0. The error is:

org.mindswap.pellet.exceptions.InternalReasonerException: Unknown term type: not(restrictedDatatype(http://www.w3.org2001/XMLSchema#int,[facet(http://www.w3.org/2001/XMLSchema#maxExclusive,literal(90,(),http://www.w3.org/2001/XMLSchema#integer))]))

This does not happen in a stand-alone program (at least as far as I can see) however, it still does not classify Patient5. The results are effectively the same as with version 1.5.x.

UPDATE (2009-06-30): The problem with Patient 5 under 2.0.0RC5 had to do with the restriction type.  A solution that works is in the next blog in the series.

Dates are another common type that needs ranges, and hopefully, a future test will deal with these.

Follow

Get every new post delivered to your Inbox.