Imagination and fiction make up more than three quarters of our real life. — Simone Weil
Knowledge engineering is making a comeback. You might be wondering why, what is it, and why you should care.
The wikipedia explanation of knowledge engineering isn’t as helpful as it could be. So here’s my home-baked definition: knowledge engineering, a type of software engineering, is the process of developing the aspects of a software system that encode knowledge explicitly, such as in the form of an ontology.
What do I mean by explicit encoding? One where the developers intend to convey the knowledge by means of the data structures they implement. The system as a whole should include the means to enact appropriate inference over the knowledge. As well, the knowledge base should be modular with respect to the procedural code — you can change the encoded knowledge, including the schema, without breaking any other parts of the system.
In contrast, traditional computer systems encode knowledge implicitly in the form of procedural code. I think about it like this sometimes: traditional software systems know how to do things. In contrast, systems that incorporate knowledge bases often know about things.
Developers working on traditional systems are software engineers; developers working on knowledge bases are knowledge engineers.
The types of system aren’t mutually exclusive. Systems that combine great procedural stuff with explicitly represented knowledge are best. Familiar products that combine knowledge bases and procedural code include Google’s search, LinkedIn, and Facebook. A popular pattern lately is to encode explicit knowledge in graph data structures. (See my post Graphs, Graphs Everywhere for more on this topic.)
But wait (you might be thinking) any software system that uses a database has explicitly encoded knowledge doesn’t it? And SQL engineers have never been called knowledge engineers. Nor have SQL or NoSQL databases normally been referred to as knowledge bases. Why not?
I’ve recently become very familiar with the more traditional way of storing data and its philosophical limitations. I say “philosophical” limitations on purpose. There are lots of practical, engineering pros and cons for every data storage solution, be it a SQL, a NoSQL, a graph database, a triple store, or a document database. Any one of these solutions can be used to encode explicit knowledge. Any one of them can be used to store an ontology. And with some work, you can even wire the crustiest old system sitting on top of a SQL database to do what ontologists think of as real inference. But what I’ve observed is that the engineers working on these traditional systems tend to think about the encoded information as data, not knowledge.
This philosophical difference among the developers translates into design choices. Those choices typically end with knowledge that is implicit in the system. This is no slam against traditional approaches. It is simply that the design of the traditional tools makes a certain type of system design more natural.
Here’s what I mean: tables are meant to store records. Records are not commonly thought of as representations of real world entities. It can often be almost impossible to articulate the meaning of a row in a table. This is not just because the engineers who designed it may no longer work for your company or someone has forgotten. (Although that happens a lot.) The difficulty in articulating the knowledge arises from the fact that the row in the table was never meant or intended to encode a fact. The rows and tables were never designed to represent entities in the real world. If you work with a database that has hundreds of rows or tables that are “about” one real world entity, and where there is no practical way to say which entity any particular data structure refers to or means, you are dealing with a system where data may be encoded, but where knowledge isn’t. Or isn’t for any practical purposes.
This isn’t just a difference in what you call it. Getting knowledge out of such a system is like figuring out the age of a tree by counting the rings. The information is, in a sense, there, but it did not get there by virtue of the system developers’ intentions.
But worse than that, such information is not accessible by the system itself. The system can’t use it as a human user would expect it to. To get the right, intelligent type of behavior, you’d have to re-engineer the system.
So you might as well start off the right way — it will all go better in the long run, and even in the short run.
Trust me: I’m a knowledge engineer.