The SWRL test suite continues to grow, but in the mean time, to help look at other factors, an example is needed.
Let’s say you are working on hospital systems and the administration is installing a new shared document repository for medical files. Various departments will be sending files (patient medical records, imaging results, administrative documents), whatever files are referenced across the enerprise inside a hospital. As the files are sent from various departments, they need to be tagged with various metadata tags before placing them into the repository.
In the real world, there are standards like XDS for medical file sharing. Part of the standard defines the standard required metadata, as well as a standard way of defining extended tags. The standardization of tags allows applicatations all over the enterprise to find files easily. As the files come in on a nightly feed from various departments, they would automatically be processed and uploaded with their metadata to the repository.
In this example, an initial categorization can be done based on file extension and the directory structure of the files. After that, the files in a given group can be passed off to other processes to scan specific file types. To automate the tagging, we could write a Java program to run the rules, but instead, we will use OWL/SWRL. What makes OWL/SWRL rules so useful here is that:
- They are all in one central group of files in a single format.
- They are external to the coded components (like the Java process components that fire the rule engine. Thus the rules can usually be changed without needing to change the installed code components.
- They are much more compact and easier to maintain than the corresponding Java representations (at least if you are using a tool like Protege).
The architecture of the application follows the norm of business rule engines:
- As a component in the enterprise, it solves a specific, well-defined and self-contained logical task involving decisions. The problem may be broken down internally into one or more logical sub-units, each doing a logical sub-task, but they all follow the same process.
- Each task uses one or more rule bases, which contain facts and rules. These rule bases are immutable. Typically, they are loaded once and reused by the task to save on load time.
- Each time a task runs, it creates a workspace, accesses the rule bases, asserts a series of input facts, runs inference, then queries the results. The rules may “call out” to enterprise resources outside the rule engine (databases, web services). The results of one task may feed into another task (much like any other process).
In the case of this example, the task is tagging based on the file information. The rule bases are OWL/SWRL files that contain facts and rules needed to do the tagging. The input facts are details about the files, which are created by a simple Java program that runs each night to pick up any unclassified files and store the basic file information as an RDF file. The RDF file is the input to the process. A second process picks up the RDF, determines the tags and uploads the files and metadata to the repository. (The last part is accomplished with about 50 lines of Java using the XDS client API, but that is not covered here. )
The input data is a record of the files and directories in the fileset. The file information would consist of a unique ID for each new file, along with its full file/path name, date and size. The files are placed into various root directories by the various departments, and each department has different conventions.
The interesting part here is the rules and facts that make it all work. If each department places files in specific directories, one rule might be to set the owner on each file in those directories. Before that, it might be useful to have a rule that marks the files contained in each directory. If the data file uses the following classes and properties:
- File
- containedInDirectory – Functional, inverse of containsFile, range is Directory
- hasFullName – The full path and file name of the file. Datatype, value is string.
- hasOwner – Sets a string that will be used for the owner tag value.
- Directory
- hasFullName – Same as above.
- containsFile – links directories to files
Then the containment rule would look like this:
File(?x), Directory(?y), hasFullName(?x,?xp), hasFullName(?y,?yp),
startsWith(?xp,?yp) -> containsFile(?y,?x)
Of course, by the inverse, any file is also linked to all of its parent directories. Needless to say, this little rule will cause some inferences.
Once this is in place, any number of owner assignment rules like the following can be defined:
File(?f), Directory(?d), containsFile(?d,?f),hasFullName(?d,"files/reports/staffing")
-> hasOwner(?f,"Administration")
When retrieving the results after inference, the hasOwner values are extracted and set as tag values on the files that had that property.
This example will be continued in the next entry.