Even in a trivial application like this, some design issues become obvious very soon. In the File Tagging rule base, most of the rules are designed to set File individuals to new classes or add specific properties to them. All this information needs to be extracted at the end of the classification process by the application in order to do the actual tagging operation. As new tags and organizations are added, more and more of these tags and classes will need to be maintained. If the application needs to query more and more different types of result values from the Ontology, the code could need alterations frequently.
For example, in the current setup, the owner rules are asserting hasOwner(?f,”Administration”) and other properties that need to be tracked, while the File sub-classes are also being used and have to be interpreted by the application to get the MIME tags for the files. Clearly, the application will need to be updated for every new tag that comes along.
It would be better if the structure of the Ontology and rules were not coupled so directly with the application. In the end, the calling application just wants to know the file tags, where each individual result is a triple of the file ID, the tag name and the tag value for each new tag needed. It does not need to know the internal details of how they were represented in the Ontology.
The problem with all the rule examples shown so far is that they result in either a Class assertion or a property assertion on an individual found in the body (condition) clause. It can produce any number of these, but that is all they can create. In this case, we have 3 values per result, and they need to be grouped in some way. A number of possibilities pop to mind:
- Use Blank Nodes for Results – In the consequent (head), create a blank node with a type of Result (making it easy for the calling application to query the results), each with three predicates (one for each value: file, tag name and tag value). This means that an individual of this type is created for each rule “firing” (each result).
- The first problem is that in a Horn-style rule, the creation of a blank node requires that the rule will have a variable bound only on the consequent-side of the rule. Try this in Pellet and you get a message like this:
- WARNING: Ignoring rule [DIVXFile(?c)] => [Result1(?c2), arg1(?c2,"123"^^integer)]: Head atom Result1(?c2) contains variables not found in body.
- Basically, it is saying that the variables have to be bound in the condition.
- The other problem is the approach for building a blank node on the consequent side of the rule. While there is nothing in the spec that says this is not allowed, there is nothing that guarantees that it is to be supported. This means that support for such a feature will be inconsistent between engines.
- Alternative: DL Restrictions – If it cannot be done with SWRL, can it be done with DL? This would entail creating a consequence or a non-SWRL complex class definition that would create a blank node as a restriction, then apply the class to the individuals as part of the rule. However, I have not been able to find a sample that does this with blank nodes, and I do not know if it is even possible. The idea would be to use DL to get past need for using an unbound variable on consequent side.
- Use Built-in to Create Instance – To push the instance creation to the condition of the rule, use a built-in to create an instance that can be set with the values in the consequence. (It has to be on the condition side to get around the error stated above.
- The problem is that this will normally create instances whether or not there is a result from the rule, leading to a lot of junk in memory. This could be a performance problem.
- The other problem is that this requires a built-in that supports the creation of instances. The SWRL Spec does not provide one. In Protege’s SWRLTab, there is a custom-designed one, but that will only be of use for ontology work done inside Protege or in engines that can import the function. In most of the work I would be doing, this is not an option. (For more information on SWRL in Protege, see this post.)
- Use Lists – If a list (with two entries – tag name and tag value) can be set with a hasTag property on the consequent side, then any number of tags can be set (one per rule firing) and retrieved later by looking for the property names. When the properties are extracted by the calling application, all three values are retrieved correctly and it also allows the application to get all results for the whole rule base with one query which means less maintenance as the rule base grows.
- There are limitations on the language/model used (and thus the inference engine). In the SWRL Spec, section 8.7, where it describes the built-ins used for lists, it says “RDF-style lists can only be used as OWL data in OWL Full”. I am not sure what the full implications of this statement are.
If the above is true, then the results of rules (and probably DL) inference are limited to alterations of the existing instances and it will be impossible to build more complex constructs. This will probably be a problem in cases like:
- Ontology Mappings where the “shape” of the data mappings are not 1-to-1 (class renaming and property reassignments). The moment that an intermediate object needs to be created, the mapping will be in trouble.
- Rule results that bind more than 3 values per result. If the results need to be grouped into sets, it implies the creation of an object, and this becomes a problem.
If more information comes up regarding this, I will post it in a future entry.
For the file tagging application, the best thing that can be done is to use sub-properties to help generalize the query from the application. The results need to be bound triples of information (file ID, tag name, tag value), so these can be modelled as properties where there is a specific property for each tag name set by the system. For instance:
- hasTag - the root property
- hasOwnerTag - corresponds to the tag name “owner”, sets the name of the owner organization of the file.
- hasMIMEType - corresponds to the tag name “MIMEType”, sets the MIME type based on the file content format.
By making all the property names sub-properties of some root property, the application can simply query the ontology after inference for all axioms that include properties of the root type (that is, find all axioms with hasTag as a property). The calling application will then pick through the axioms and convert each one to a tag. (It will need to keep a mapping of property names to tag name strings, of course. To get fancy, this mapping could also be kept in the same ontology as static instances and be queried by the caller application as well. That keeps all the information in the same ontology.)
There is more that could be said about this example, but there are more important issues to examine. In the next few entries, I will look at some basic DL issues.