Chunking a Timeline

So, say you have the case in the last entry and an exponential number of event before/after relations.  Either you pre-calculate the entries, which uses lots of space but minimizes the query time) or you leave it up to the engine to run the rules in real time (in which case even the smartest engine will need to calculate a huge pile of inferences at query time).  One way or another, the rules need to get fired and for even a medium-sized database, this could make operations crawl.

One way humans deal with the problem is partitioning.  Think about it: If someone asks whether Ghandi was born after King Solomon, most people do not need to calculate or even know the dates.  They immediately know that King Solomon was in “ancient times” and Ghandi was in “the twentieth century”, and these blocks of time easy to calculate.  Unconsciously, time events belong to groups in our memory and the groups have time intervals and the intervals are deep in our memory framework (heavily pre-calculated).

To set this up, various groups of events (“births in ancient Greece”, “current events in 1949″, “stages of construction on the Hoover Dam”) are created with set time intervals, and their before/after relations are created (or implied). Some of the groups will overlap in their intervals. Events are put into those groups and within the groups, before/after relations are created (or implied) for each of events. This two-tiered system allows some of the events to be calculated by partition, as long as the two intervals are non-overlapping.  (There are other things that can be done with the overlapping intervals.)

Note that the exact time of the event does not need to be known.  In order to put this into a group, the group should have an known interval (“events in 1940″ has a definite time limit, even if the events in it were just definitely inside that year), but even the boundaries can be fuzzy (all “ancient roman times” were before “the 1800′s”).  In current events or project management, for instance, this gets important when tracking dialogs, news stories or other sequences, since the date it happened may be known, but the exact time when a specific part of the sequence happened is not – just what was before and after.  We humans do this all day long, every day.

To take advantage of this approach, however, you really need a different relation (“fast before/after” maybe) that uses these rules:

  1. If the two events are in the same group, use the direct before/after relations.
  2. If the two events are in the different non-overlapping groups use the relations on the group to determine the before/after relation.
  3. Otherwise, things get more complicated and it may not be possible to say whether the events are definitely before or after each other. There are some smart approaches that can be applied, but they are not fast.

This can be done in rules inside the engine, but there is also the option of offloading the partition function. In SWRL, for instance, a custom built-in function could be defined to offload some of this functionality into a relational database.  Relational databases can take advantage of indexing to speed up the queries in each of the first two steps.  This is especially true if you have a lot of highly interconnected events in a group, such as in long-running dialogs, stories or meeting notes.  If you are doing knowledge engineering, this could be an index of subject matter expert interviews, for instance.

As loose as these groups may be, even with fuzzy intervals, they can dramatically cut down on the amount of events that need to be considered in a query, so they can allow knowledge bases to get an order of magnitude larger when applied properly.

Customizing Time Concepts

One issue most people have with applying high-level ontologies is that they normally are very abstract and it is hard to see how they apply to the problem at hand.  Time ontologies have been studied for centuries, so philosophers have had plenty of time to simplify and extract the essence of the concepts, but an ontology developer now needs to drag them down from the clouds to the problem area.  This can be a problem.

For example, history is full of births and geneology (kings and emperors, for instance).  Births are instants, even if the date is not known, and geneologies are sets of relations between births.  If dates are known for a birth, then they can be attached to the birth event.  This part is simple.

However, there is also general understanding that rules apply to the sequence of time events.  For instance, a person cannot be born before his parents (either of them).  This is also transitive, so it would apply to grandparents as well, and so on.  Prior (see reference in the previous entry) lays out a fairly comprehensive logical system for representing relative time and operators which covers the representation of before, after and so on.  If these rules are applied in a birth and geneology ontology, then they would also have to be repeated in many other time-based areas where relative sequences were used (project planning, development, speech, news casts items).

To avoid this, intermediate or upper ontologies can be used to hold the rules and general relations, then application areas can state sub-property relations to “inherit” from the general ontologies, allowing rule reuse.

Using Owl-Time as a base ontology, say a new ontology is defined which defines the following rules (SWRL presentation, sort of):

@prefix time: <http://www.w3.org/2006/time>

time:Instant(?t1), time:Instant(?t2), time:Instant(?t3), time:before(?t1,?t2), time:before(?t2,?t3) -> time:before(?t1,?t3) .

time:Instant(?t1), time:Instant(?t2), time:Instant(?t3), time:after(?t1,?t2), time:after(?t2,?t3) -> time:after(?t1,?t3) .

Pretty simple stuff, of course.  Any serious ontology would also include combinations of time:Interval and time:Instant and there are a number of more interesting axioms in Prior’s paper that could be stated as rules in the ontology.

In the Births and Geneologies (gen:) ontology, these concepts and rules would be inherited and extended.  The primary concept is gen:Person, which could have a birth property, but to take advantage of the existing types, births are stated as a type of time:Instant and related with an object property (gen:birth).  The basic relation between children and parents might be gen:parentOf (with sub-properties gen:motherOf and gen:fatherOf, normally).  If relative time is an issue for the application, a rule can be stated thus:

gen:Person(?child), gen:Person(?parent), gen:birth(?child,?t1), gen:birth(?parent, ?t2), gen:parentOf(?parent,?child) -> gen:before(?t2, ?t1).

When this rule fires, the system will know that parents birth dates are before their child birth dates, and thanks to the upper-level ontology rules that work on time:Instant, the same will be true of the grand parents and great-grand-parents.

Of course, if the system is capable of running simple rules like this (and that is a stretch at the moment), the above rule will lead to an exponentially growing set of assertions about dates.  If you give 10 kings (just a paternal line), the last will have 9 assertions, the one before will have 8 assertions and so on.  Considering that other “interesting” relations and rules may have been added at the upper-level time ontology and that this is only one small dimension that might be needed in even a basic geneology knowledge base, this will be a serious scalability issue.

Normally, a historical application would need to have more than a single line of rulers, and a typical question might be “Who else was alive during his lifetime?”  To answer concurrency problems in history, there is no choice but to keep these relations. How can this be improved?

Time Ontologies

The first topic area I want to look at is Time. Time is a rich area of thought and greatly complicates most existing human languages.  We daily discuss notions about future and past, continuous and instant time references, schedules, events, Now versus Then and a host of temporal relations, and we never give it a second thought. Great minds have been studying the problem of Time as long as language has had tenses.  As a matter of fact, some of the recent work in the logic of Time has been in the area of Tense, some of which is summed up in this article:

  • Prior, A. N. (1971). Recent Advances in Tense Logic.  In E. Freeman and W. Sellars (Ed.), Basic Issues in the Philosophy of Time, The Open Court Publishing Co., La Salle, Illinois.

Many systems have been proposed over time for dealing with Time.  Pat Hayes published a good summary of the various theories that were in use in 1995 in his A Catalog of Temporal Theories.  Among other things, this catalog includes his breakdown of major types of Time covered by the current theories.  Some of the common notions that are really important include:

  • Calendar Instants (Absolute Timestamps) and Intervals - These concepts represent a known point in time relative to an absolute time frame, such as the Gregorian Calendar (“June 23, 1883 at 3:05 pm in the afternoon”).  The issue of whether a time like this is an instant or a duration depends on the practical usage of the term, for knowledge workers, if not philosophers. There are large and rich systems of relations between intervals and times (before, after, during, ending at the same time).
  • Recurrent Instants and Intervals – Whether in absolute time or not, there are applications that need to record recurring times for rules (“closed on Sundays”), schedules (“Conferences held yearly”, “Meeting recurs every Tuesday at 8:30 am”) and so on. A related notion is the use of relative times in scripts (“Bring to a boil for 5 minutes, then turn to minimum and simmer for 10 minutes before serving.”).
  • Durations - A duration is a length of time (with or without a related absolute start and end time), such as “The half-life of Thorium”.
  • Units of Time – Aside from time frames like Gregorian time, the units used to measure time are also an area of concepts, like “Second” and the conversions between units.

These are common and basic, but like all foundational concepts, they build into larger concepts, like Events, Schedules, Scripts and Change Management Processes, all of which are crucial in knowledge base construction (or DB Schemas).  Likewise, dozens of critical relations exist between the basic concepts, like before, after, occurs during and so on.

So, naturally, most of the Upper Ontologies discussed earlier devote a portion of their content to ideas of Time and its relations. SUMO, for instance, has dozens of time concepts high in its class heirarcy.  There are also widespread domain-specific ontologies that deal with time, notably the OWL-Time ontology.

In most information systems, the Calendar Time concepts are the most important. They represent times elements in logs, financial transactions, historical events, meeting dates and many other domain concepts. Most ontologies that represent times support both atomic timestamps and “exploded” time formats.  Atomic formats are compact, such as the XML date-time type, which represents a time and can be compared to another date-time.  “Exploded” formats break the various elements (year, month, day, hour…) into attributes of the Timestamp concept, which allows partial representations (“Thursday” of any week), unit conversions (the number of seconds between two dates in 1993) and other more complicated operations. Both of these can be interchanged when convenient in an application.

In the next few entries, I will look at some of these concepts.

Follow

Get every new post delivered to your Inbox.