On the true nature of use cases
April 23, 2008
Use cases are primarily used to make software for knowledge workers. Sure, they are abstract enough and flexible enough to describe users’ interactions with a variety of systems. You could - and perhaps people do - write use cases for driving a car. But most of the time when you see a use case, there is software in the offing.
And what exactly do knowledge workers do? That’s easy. They make decisions in the area of their expertise. They are immersed in a situation made intelligible by their experience and training, and apply their knowledge and reason to make decisions which result in actions.
A user sitting in front of a software screen can only select things. (We can take care of edge cases by describing writing text as selecting one text out of all possible texts, and by describing drawing with a paint program as selecting one picture out of all possible pix, etc.)
Each selection represents a decision - check the box, set that radio button, select this item from a combo box - and the little decisions are rolled up into bigger ones: save, start, do.
There is a range of complexity and sophistication of decision making, and a wide range of the type and depth of expertise a user has to bring to the arena. At the low end, there are binary decisions in simple spaces - open a file? ok. At the high end, there are sophisticated tools that can be used to help analyze complex situations using queuing theory, game theory, probablility, simulation, and other formal techniques. More about the nature of decisions in a future rumination.
So what is the true nature of a use case? A use case is a decision context. It describes an expert engaged in a decision scenario. The overall goal of the use case is reached by a series of such decisions. The expert needs to be immersed in the environment where the decision is possible. They have to be provided with the information, the research, on which to base their decision. And they need to be presented clear and correct selections that correspond to qualifying and making the decision.
As always, more to come.
On structuring requirements
April 9, 2008
Requirements are frequently uncategorized or poorly categorized. Unfortunately it is not unusual to have requirements captured as an undifferentiated list in a spreadsheet, or a shallow bulleted list in a word doc.
There are a number of useful ways of structuring requirements. We’ll touch on one intially, then augment this over time.
One very useful categorization is by the frequency of change or possible change.
The laws of physics, our understanding of how the physical world works, are fundamental requirements for all projects, even if frequently not articulated. If there was a big bang, there may have been some change near the bang, but it is safe to assume the laws of physics will be stable for the lifetime of one of our software projects. Sometimes our understanding of them can change, though - if x number of objects of a given size fit in a particular box, that is unlikely to change, but the invention of a new lattice packing algorithm might do the trick. Or making all the chips the same size and shape so they stack in a tennis ball can.
Federal and state laws are generally slow to change. But they do, and if our project is potentially impacted by such changes during its lifespace, we should take the possibility of change into account.
Contracts with our business partners change at the end of the contract period - or sometimes sooner if there is a dispute.
Our technical infrastructure changes. New technologies are introduced and adopted.
Business objectives change much more frequently, driven by changes in the market or changes in leadership.
Categorize the requirements by their frequency of possible change. Qualify that with the period of the change and its nature. Find logical gaps in the resulting picture and fill them. Then account for the temporal volatility of requirements - you are designing for the lifespan of your system, not just for the snapshot of it when it rolls out.
On why HTML and XML have succeeded, and the lessons therein
March 19, 2008
Why have HTML and XML succeeded? I believe it is a combination of four things:
Accessibility
There is almost no barrier to entry. Notepad and IE and you’re in business.
Resiliance
Tolerance for error. Most mistakes you can make in HTML won’t prevent your page from displaying. Unlike most programming languages, HTML will let you be and remain imperfect and continue to make progress. XML is tolerant in a different way. The syntax is more rigorous, but also very, very simple and regular, so XML is tolerant of conceptual errors, which, again, don’t prevent you from making progress.
Broad tiers of utility at different skill levels
You can do a lot of HTML and know nothing of cascading style sheets. But once you know them, there is a rich set of new things you can do. You can learn a small set of things, and get a lot of mileage from the knowledge. With each tier, each quantum of skill, there is a broad new plateau that opens.
You can go deep
This is a corollary of the previous point. From HTML there are cascading style sheets, and Javascript, and you can move to the server and find Linux Apache MySQL and PHP, and there is almost no end to how deep. From basic XML there Data Type Definitions and then Schemas, there are XSLTs and XPath and again, a long path with almost no end. And again, for each step, there is a broad range of new function that is available.
So what is the lesson? Keys to success for a new development technology:
- Give users a low barrier to entry using tools with which they are already skilled.
- Give them a broad range of utility with the first step in skill.
- Give them visible success depite their errors.
- Give them a new plateau of function with each stepwise gain in skill.
- Give them a path to go deep.
On popcorn, requirements, and missing the point
March 15, 2008
I replenished my popcorn supply today, and my interaction with the checker at the grocery store set me to thinking about how frequently the software requirements that we capture miss the point. And she only asked if I was going to use a hot air popper.
I regularly buy Western Family white popcorn in the two pound bag. The original driver was economy. I am a weekly commuter. I maintain two residences: my permanent family home is in the Rogue Valley in Southern Oregon, my pied-a-terre is a studio apartment over Phil Cusick’s frame shop in downtown Newberg, Oregon, about a half hour drive southwest of Portland. I try to keep the bills for the work side as small as possible, and to favor the home and family side. The financial Stoicism makes me feel virtuous - old school virtus with a modern gender twist - during the week as I knead dough and bake bread, or pick pintos and soak them overnight to have them ready for the crockpot the next morning.
The path of fiscal conservatism led to popcorn. I like salty snacks. But chips - potato, Frito, Dorito, kettle - are expensive. Pretzels are ok, but I tire of them quickly. I found microwave popcorn at WalMart for less than three bucks for a box of fifteen individually wrapped Pop Weaver’s.
And it was ok. But one week even WalMart was too dear. Western Family (northwest store brand) white popcorn to the rescue. $1.65 for a two pound bag, which I find lasts for about as long as a box of Pop Weaver’s.
And the thing is, it is way better. I don’t have a hot air popper in Newberg. I’ve used hot air poppers with bulk popcorn, and they pop great, but the popcorn isn’t very good - it comes out kind of tasteless. I’ve microwaved a lot of popcorn, and it is just ok at best. Popcorn lung and the diacetyl-whatever-is-now-gone notwithstanding, it has a faint chemical aftertaste from the fake butter and whatnot.
Movie theater popcorn, when it is fresh, is pretty good. And the Jiffy Pop I had as a kid was yummy if it was cooked well. If you are as old as me or nearly, you may remember Jiffy Pop: a foil pan filled with popcorn and oil under a cleverly spiral-folded foil cover. You (my first instinct is to use ‘one’ instead of ‘you’ but Donna Meek in high school accused me of never using slang and I’m still seeking her approval decades later) cooked it on top of the stove, and the popping popcorn swelled the foil cover until you had a foil globe full of popcorn. It was easy to burn, and took some technique, so, like the barbeque, it was Dad’s job.
The Western Family white popcorn was a revelation. I don’ t have many pans in my bachelor pad in Newberg. I used an aluminum Dutch oven that we bought at a church rummage sale from the pastor’s wife who lives two doors down from us. I followed the directions on the bag: put a little oil in the pan, put in a few kernels of corn, and heat. When the first kernel pops, add 1/2 cup of corn, cover, and shake regularly until popping subsides. Take it off the heat, season and eat.
Revelation. Roasting the kernels in the hot oil gives the hulls and in turn the whole popped kernel a wonderful toasted nutty flavor. And the Western Family popcorn kernels pop up big and fluffy - the ratio of fluffy white stuff to hull is very good. As opposed to the national brand Jolly Time, both white and yellow varieties, which I’ve tried since, which always seems to pop up meanly.
Once popped, you can flavor the popcorn with the classic butter and salt, or my daughter’s favorite garlic and dill, or whatever - it is a wonderful flavor medium.
Ok, so what does this have to do with software requirements?
Money and time (which equals money) are the low-hanging fruit of requirements. They are easy to quantify. Every company has accountants. Every company has a bottom line.
In the world of popcorn, time and money get you hot air poppers and microwave popcorn.
And I’m sure the engineers at Orville Redenbacher and the like enjoyed rising to the save time and money challenge.
To pop popcorn, you have to get it hot - above a certain minimum heat, below which it won’t pop, but below a certain maximum heat, at which it will burn, and keep it there for a certain length of time. It doesn’t all pop at the same time, which makes the temperature window even smaller for ideal results. When it pops it expands in volume fifty-fold (or somesuch). When it pops it jumps. How can we contain the expansion, or leverage the motion? You can just hear the gears in the popcorn engineering geeks whirring.
But the fundamental requirement for popcorn is that it tastes good, and that it is pleasurable to consume. I am willing to spend ten minutes popping bagged popcorn on the stovetop in a Dutch oven because it tastes good. (The careful reader will point out that I only discovered it because it was cheaper. True, but I don’t believe that fact detracts from the overall power of the popcorn point, which is upon us.)
Here’s the point. When we have a list of requirements we’ve gathered from the business, their prioritization and resolution should not be driven solely or even primarily by short-term cost and technology.
Taking a business process and finding a new automation boundary where we can deploy a gizmo that will save enough money to get the sponsor a tee time with the CFO is the lazy way out.
Taking that same business process and finding a new satisfaction boundary is the more rewarding path, for our employees, customers, and ourselves. The ROI is much harder to measure - but it is there.
Many software projects will deliver microwave or hot air popcorn to the business. And the business just doesn’t know they would really prefer Western Family white popcorn cooked in hot oil and flavored with chipotle, chives, and garlic until they’ve had a taste.
On rules and relations
March 13, 2008
Having designed and built a number of rules engines over the years, I’ve come to realize that rules and relational databases just don’t get along. I thought I might explore why that is, and, given the shotgun wedding of the two demanded by Oracle’s dominion and the emerging demand for business rule automation, suggest some ways of working around it.
Rules are sentences. Remember the other name for the propositional calculus is the sentential calculus.
So rules are a kind of language. And one of the defining features of language is that its structure is recursive: bigger units are built up by combining similar smaller units, which themselves may be combinations of even smaller similar units, and so on.
The first problem with putting rules in databases is that databases don’t like recursive data. They especially don’t like recursion (think self-join) to an unknown-ahead-of-time or varying depth.
The second problem is that our rules’ literals, the ‘variables’ which are bound at evaluation, usually come from the database. So there is a dependency between our rule structure and the database structure. There is a modeling-level problem here: a rule stored as data has a structural dependency on the level of structure that contains the data.
So storing rules in a relational database makes them difficult to work with, and difficult to maintain. (I confess I have not looked at how other people’s rules engines such as Drools have addressed this problem. I am inspired to do so now, and will add to this thread with what I find out.)
There are a couple of practical things that will help.
The first is something called disjunctive normal form. This is something well known in the realm of automated theorem proving. In disjunctive normal form, or DNF, a formula is expressed as a disjunction of conjunctions. ‘Not’ is only allowed to bind to literals. So these are valid formulas in DNF, assuming ‘A’, ‘B’, and ‘C’ are literals. I’ll use a v for ‘or’, a ‘^’ for and, and a ‘!’ ‘not’:
- A
- !A
- A ^ B
- (A ^ B) v C
- (A ^ !B ^ C) v (!B ^ A)
And so on.
DNF has some interesting and useful features.
The first is that it lends itself to conditional evaluation. You can determine the truth value of the entire formula very efficiently by simply starting at the leftmost literal and evaluating to the right. The first false literal you reach in a conjunction makes the whole conjunction false, so you can skip to the next conjunction. And the first disjunction that is true makes the whole formula true.
The second useful feature of DNF is that it consists of only a few elements in a very regular structure, making it relatively easy to model.
The third useful feature of DNF is the ‘resolution principle’: (p ^ q) v (!p ^ r) –> q v r is a tautology, so we can replace the left side with the right before we evaluate the rule. There are other similar simplifications that can be applied to make evaluation more efficient (these are only two examples):
- p v ( p ^ q) –> p
- ( p ^ q) v ( p ^ r) –> p ^ ( q v r )
The fourth useful feature of DNF, and the most suprising, is that it has all the expressive power of the full propositional calculus. Any WFF of the propositional calculus can be mechanically translated into DNF. If we combine this with the substitution strategy we just visited, we come up with a proof strategy:convert to DNF, then substitute until you reach a tautology. In a rules engine we can follow the same path until we run out of substitutions, then we bind our literals to our domain and evalute. (One caution - if you start with something in the inverse form, Conjunctive Normal, and convert it, you can end up with an extremely large - think exponentially large - DNF formula.)
So rules (or just their antecedents if our model is logical antecedent->consequent action) can be stored and evaluated efficiently in DNF. Translating from the full PC can be a challenge. In practice, business users don’t use material implication in business rules, so constraining the UI to DNF is not particularly onerous. UI designers and builders will appreciate the simplicity and regularity of the form.
So lets say you’ve followed my advice, and have a database for storing DNF, with relational expressions (like schema.table.column >= scalar value) as the logical literals. (I will flesh this out sometime with a pseudocode example.)
How do you query a rule from the database, when you don’t know how deeply nested it is?
The answer is to use Oracle’s ability to do hierarchical queries. Hierarchial queries appeared in Oracle 8i (don’t quote me) and have been improved in subsequent versions. Hierarchical queries permit querying n-deep self joins efficiently in a single SELECT, where parent->child rows (and->grandchild rows, etc, to whatever level of nesting is present) are returned as consecutive rows.
Getting our rules out of the database is reduced to executing a hierarchical query, then iterating over the result set to build up the rule from its parts.
Addressing the structural dependency problem requires bridging the modeling level gap. One effective approach to managing this is to have the schema name, table name, column name, data type and size part of the record of the literal. Then it is a simple matter to query the data dictionary to determine if the schema->table->column->type->size is valid and raise an exception if a database structure change has broken rules.
This will all be a lot more clear with some explicit examples. I’ll try to add them soon.
On software as art
March 12, 2008
In CatB Eric S Raymond memorably described software - pre Linux - as something “carefully crafted by individual wizards or small bands of mages working in splendid isolation.”
The other frequent metaphor, besides D&D-flavored magic, is art. I used to believe that was going too far. I followed the lead of my friend Ric Gagliardi, who has described the creation of software as artisanship, akin to throwing pots - you apply a skill, and end up with something useful, which may also be beautiful.
I also used to believe in the old software adage ‘make it work, then make it fast, then make it pretty.’ I no longer believe that. Now I believe ‘make it pretty, keep it pretty, and if it uglies up stop and figure out where you went wrong.’
And along the same lines I have also come to believe lately that art may be the better metaphor. I came to this conclusion while studying arguably the most important painting of the 20th century: Picasso’s Les Demoiselles D’Avigon.

This is the seminal work of Cubism. It was completed in 1907. It is a large canvas - eight feet square - combining African themes with a revolutionary use of form.
Reactions to it in private viewings - by Henri Matisse among others - ranged from consternation to outright laughter. So Picasso stored it face-to-the-wall in his studio. It was not displayed to the public until a show at Salon d’Antin in Paris in 1916.
Before studying this painting, and its history, I had this vague idea that art sprang fully-formed from the artist like whats-her-name from Zeus’s forehead. Turns out nothing could be further from the truth - and remember, we are talking about one of the greatest paintings of all time by one of the greatest painters.
‘Les Demoiselles D’Avignon’ (it wasn’t actually named that until the writer Andre Salmon christened it for its first public display in 1916 - until then Picasso called it ‘The Philosophical Brothel’) was started in the winter of 1906/7 and finished in the summer of 1907.
Picasso did literally hundreds of sketches and studies for this work. Dozens of studies examined different approaches to form.

Others looked at the approach to the overall composition.

The composition is a classic theme. Much has been made of its compositional similarity to Cezanne’s ‘Les Grandes Baigneuses’:

But both paintings have their compositional roots in the works of Titian and Rubens.
Picasso’s composition evolved. The medical student at left in the compositional study, and in the figure study above, was removed. The sailor in the middle went away.
Picasso also studied technique. The painting is not only notable for its content, but for how it was painted. One technique is called impasto, where the painter draws a figure in dark outline then fills it in with horizontal strokes of a heavily loaded brush. Using another technique called ‘ground in reserve’ Picasso allowed the primer layer to show through in places, giving the painting remarkable depth and transparency.
When he began the painting, he was confident in both the overall composition and the technical approach. That did not change substantially. There were an number of smaller changes. Even very close to its completion the figure at upper right was substantially reworked.
I don’t think I need to connect the dots for you. The parallels between this kind of art, how art is really produced, and software are many and obvious, including the use of a code name for the work by the developer which is replaced by a marketing name when the product ships :).
The metaphor has legs.
On events
March 12, 2008
With the emerging adoption of event-driven architecture, arguably more important to the improvement of the corporate software infrastructure than service-oriented architecture, I thought we might take a fresh look at events per se. After a review, I’ll make an argument for extending the standard taxonomy of events to include a new event type , ‘observe,’ and an argument for extending the standard event model to include the volitional entity who caused the state change. My working understanding of events comes primarily from studying the work of James J. Odell. Mr. Odell published a couple of articles on events in the Journal of Object-Oriented Programming back in the mid-90s. They were collected, along with a number of his other articles, into a book called ‘Advanced Object-Oriented Analysis and Design Using UML’ in the late 90s, and also eventually turned into a chapters in the books Odell did with James Martin on OOAD.
The state of an object is the current value of all its attributes, including its relationships to other objects.
Odell defines an event as ‘a noteworthy change in state [of an object].’ I don’t agree, but we end up in the same place. I define an event as ‘a change in the state of an object.’ I believe the ‘noteworthy’ filter applies to a set of events, Odell that it applies to a set of state changes. I think the ‘noteworthy’ filter is a practical one, not a logical one. But we end up with a set of noteworthy state changes called events.
Odell holds that there are only two fundamental types of events (’type’ is used here in a common sense sort of way - we will bring more meta-model rigor later): ‘add’ and ‘remove’. Add brings something new into existence - a new attribute, a new value of an attribute, a new relationship. Remove destroys something - an existing attribute, the previous value of an attribute.
Odell suggests that these definitions are not ‘user-friendly’ - that it is useful both to name event types distinctly according to their use, and to combine events that happen in sequence into single event types. While I largely agree with the former, I don’t think the latter is a function of convenience, but one that can be addressed in our model.
In Odell’s naming events according to their use he is drawing a distinction between an entity’s attributes and its relationships, and giving events related to each different names: ‘classification’ for attributes, ‘connection’ for relationships. Underlying this is the more fundamental notion that entities have identity, that is, that in instance of an entity has an identity, but that an instance of an attribute does not. The color blue has no discrete identity - we do not track it through time in its actions and relationships. More about identity in another rumination. For now, it is enough to know that there is a useful distinction between an entity’s attributes and its relationships, and that Odell’s naming convention maps to that distinction.
With respect to combining Odell’s fundamental events into compound events, I think it is most useful to simply refine our definition of event to ‘an instantaneous change in the state of an object.’ This is the logical form. The implementation reality is that there may be some time involved. This is like the distinction between functions and procedures. Functions are timeless, procedures take time.
(On a complete tangent, I wonder about how we define events for continuous processes - those which we would have to capture with differentials. Is it the same?)
Odell ends up with this base taxonomy:
- Create: bring an object into existence
- Terminate: sends an object into non-existence
- Classify: an existing object is added to a new set - it gets a new attribute
- Declassify: an existing object is removed from a set - it loses an attribute
- Connect: add a new relationship between two (or more - my addition) objects
- Disconnect: remove a relationship between two (or more) objects
Odells goes on to list ‘popular’ compound events he belives are useful:
- Reclassification: to replace one classification with another
- Reconnection: to replace one relationship with another
And three others he is less sure of:
- Coalesce: combining two (or more - my change) objects into a single object
- Decoalesce: splitting an object into two (or more) objects
- Compound termination: termination of an object and all its ‘component’ objects
Odell points out that reclassification and reconnection are useful in cases where the object is not in a valid overall state ‘in between’. This brings another idea into our fundamental definition. A ‘prestate’ is the necessary state an object must be in for a given event to occur. A ‘poststate’ is the necessary state an object must be in after a given event occurs.
The entire set of valid transitions for an entity - the full connected prestates->poststates map - is its lifecycle or finite state machine.
So we again refine our definition of event: ”a instantaneous change in the state of an object from a valid prestate to a valid poststate.’
With our extended definition, it is clear that reclassify, reconnect, coalesce, and decoalesce are first-tier events. (Implementing coalesce and decoalesce is a formidable implementation design challenge.)
Compound termination is not so clear. The example Odell gives for compound termination is that of a sailboat: if we destroy a sailboat, the hull and the motor are also destroyed.
If compound termination exists as an event, I think it may be only for those strong aggregation relationships we call ‘composition.’
And my sense of composition is that it somehow works the other direction - akin to the distinction between centripital and centrifugal force. In the sailboat example, the sailboat and hull are in a composition relationship (I think Odell actually uses this example elsewhere in ‘Advanced OOP with UML), but the sailboat and motor are not. If the hull is destroyed, the sailboat ceases to exist - new hull, new boat. But if the motor is destroyed, we just get a new motor. But going the other way it is less clear. Obviously if we atomize it (plasmacize it?) with a phaser, they are all destroyed. But that seems more of a simultaneous terminate event happening, not one that somehow logically (and instantly) cascades (’instantly cascades’ sounds wrong, doesn’t it?) But if the sailboat is destroyed in the sense of being separated into its constituent parts, hull, motor, sails, and we rebuild the boat around the same hull, is it the same boat? This goes back to our question about identity and the continuity of identity. We’ll explore this in more detail in that coming rumination. We may find some clarity later here when we talk about agency. In the meantime, we’ll leave compound termination off of our list of fundamental events. (In the real world, when a boat is stripped down and rebuilt it is generally rechistened. I spent a couple of days on the Edna, a 90′ schooner that was in port in Portimao, Portugal. It was originally a sailboat. But it had been de-masted, motorized, rechistened, and employed as a cargo carrier in the North Sea. The new owners took the boat to Porto, Portugal, where craftsman skilled in work on large sailing ships re-masted it and converted it back into a sailing vessel. And they rechistened it. But I digress :))
Though in common use we call both an event and an event occurrence ‘event,’ we need to make a more clear distinction here. What we have been calling ‘events’ are really a supertype of events. We’ll borrow from UML (for now) and call those ‘Event Stereotypes.’ An event is a type which itself is the instance of an event stereotype. An event is associated with a particular class or set of classes, not an object. An ‘event occurrence’ is an instance of that event type that is associated with a particular instance or set of instances of a class or set of classes, i.e. an object or set of objects.
So our working definition of event occurrence is this: ”an instance of an event at a particular time for an object.”
We can capture the complete history of an object, from creation to termination, by recording information about all of its event occurrences: object(s) (which are of particular types), event (which is a reference to the ‘event’ type, whose definition includes the prestates and poststates of the objects involved in the event), and time.What is missing from this list, of course, is agency. The state change is the effect - what is the cause? Event occurrences frequently do not occur spontaneously. While there may be some that do - think chrysallis to butterfly - most do not. Our event history, to be complete, should capture the agent of change. We can call these agents ‘volitional entities’ - that will be the subject of another rumination. For now, we simply recogize that our event history needs to capture the causal agent involved to be complete. If the event is the effect, we must have the cause, otherwise our history is incomplete, disconnected, and of little ultimate utility.
Our event history gets unity by being viewed from the perspective of the object’s classes finite state machine - each object life history is the path it traverses through that FSM.
The other thing missing besides agency we can get at by examing the well-known acroynm for an object lifecycle ‘CRUD.’ CRUD may be defined as CReate, Update, Destroy. But it is more frequently used to mean Create, Read, Update, Destroy. There is a practical distinction between read only access to an entity, and access which writes to it.
But if we look at our taxonomy of events, there is a gap. We have no event whose occurrences will allow us to see the history of reads. In practice, this is a very useful thing to know.
So I suggest we add ‘Observed’ as a first class event. There is no prestate and poststate with respect to the object alone, but with respect to a given observer or class of observers, there is. If an agent has observed an entity they have knowledge of it.
I will try to expand this idea with real world examples as they come up.
We are left with our definitions and our taxonomy, with a massive debt to James J Odell:
Event stereotype: one of (create, terminate, classify, reclassify, declassify, connect, reconnect, disconnect, observe, coalesce, decoalese) each of which describes possible instantaneous changes of state for one or more objects.
Event: a type which is an instance of an event stereotype associated with a particular class or set of classes and which describes the valid prestates and poststates for the event.
Event occurence: an instance of an event at a particular time caused by a particular agent, which may be the object itself.
Event history: the complete set of event occurrences for a given object.
I will augment this with some pictures at some point - the explanation would benefit greatly from them.
But I think we have established some useful clarity around events. This clarity will be necessary as events inevitably emerge as a fundamental feature of the enterprise computing environment.
On the true nature of business intelligence
March 11, 2008
The expression ‘business intelligence’ is in common use. What does it mean, exactly? The definitions vary, seeming to cluster around the central concept of the use of advanced databases and reporting tools to support business decision making.
But there seems to be a lot more going on in the term. I think it will be illuminating to drill into it to try to get at the true nature of business intelligence.
On reflection there are two key ideas woven together in the expression. The first is the idea of intelligence itself. The second is the idea that a business can be intelligent. So we’ll proceed by defining intelligence. Then we’ll look at how a business could be intelligent. We’ll project our definitions of intelligence into our understanding of how a business could be intelligent. And last we’ll compare that to the commonly understood meaning, and see if we’ve discovered anything interesting on the way about how a business might pursue increasing its intelligence.
Steven Pinker has a useful working definition of intelligence in his book ‘How the Mind Works.’ (I have loaned my copy to my daughter Zooey. When I get it back from her, I’ll fix this up with more specific detail
After an interesting discussion featuring a definition of ‘what makes a good alien’ from science fiction, and a nice quote from Shakespeare about Romeo’s pursuit of Juliet, Pinker ends up with a working definition: “Intelligence, then, is the ability to attain goals in the face of obstacles by means of decisions based on rational (truth-obeying) rules.” Intelligence is ability for an entity to continue to strive toward a goal in the presence of obtacles by the logical creation and pursuit of new subgoals.
There is another useful definition of intelligence we can infer from Daniel C Dennett’s ‘Conciousness Explained’: intelligence is the honed-by-evolution ability to produce future. The better we are at foretelling the immediate future, the better our chance of survival. Intelligence is the innate ability to foretell the future not by some kind of precognition, but by an understanding of the present and the rules the world obeys.
When we say ‘business intelligence’ do we mean real intelligence, or are we simply using an anthropomorphical metaphor? I believe the former.
In our - software architects and engineers - best practice common data model, a person and a business are the same base entity, usually called ‘party’ as captured by Martin Fowler. A party can enter into agreements, legal and otherwise. A party is accountable for its actions because they could choose to do otherwise. In this model, persons and businesses are subtypes of party.
From another perspective, I believe that identity is best understood as a narrative. This is an idea I first came across in the philosopher Paul Ricoeur’s ‘Time and Narrative,’ and, later, again in Dennett op cit (with no citation of Ricoeur) where he describes identity as ‘the center of narrative gravity.’ I believe this idea of narrative identity has a lot of weight and explanatory power for software, which I’ll go into in another rumination. For now, consider that a business, like a person, has an identity: a beginning, a story with a theme and a plot, an end.
So I think when we speak of a business as being intelligent, it is more than a metaphor. Then the real question of business intelligence is this one: how does a business become smarter?
Our first working definition of intelligence is about logically changing plans in the face of obstacles in order to meet our ultimate goals. In business terms, this means having the ability to quickly change our processes in the face of obstacles in order to continue to meet our objectives. Those companies flourish, they are more intelligent, which empower their people to dynamically change their processes when faced with an obstacle. Having front line people, customer facing people, have the latitude to solve problems creatively is well understood to be a powerful enabler of business success. Now we know why - it’s intelligent.
The risk of rapid process change is that we might violate some constraint. Our constraints may be understood with respect to the speed with which they change: the laws of physics, laws, contracts, and regulations are all constraints that change at different speeds. (The laws of physics maybe right after the big bang - though we are starting to look bang-less.)
So to foster and support rapid process change in order to enable us to work around obstacles, we need an environment where the arena of process change is defined by our constraints.
This might be an adaptation of contract modeling software, which might be generalized to model all constraints, used in conjunction with a BPEL engine and process modeler. The business uses the graphical BPMN modeler to modify their processes. They are prevented by the tool from doing so in a way that would violate a law of physics (too much stuff in one truck, for example), law, contract, or other constraint.
Our other definition of intelligence, future producing, seems more what underlies the conventional definition of BI. We analyze data to understand the present, and use it to make forecasts about the future. In my previous post ‘On Reporting‘ I described what tools a business needs to do this well.
Our richer definition of business intelligence has shown us how a business can become smarter: first, by creating an environment to foster the rapid, safe change of processes in the face of obstacles; second, by polishing their crystal balls.
On reporting
March 11, 2008
This was originally posted as a reply to a thread in Paul Mace’s blog. I reproduce it here partly because it has themes I want to expand on and I think it would be better to have the root with the branches, and partly because, while prompted by the thread, it is a fully formed rumination and I wanted to call it out distinctly. The copy paste feels wrong, somehow, perhaps in its violation of the leave-it-where-created-and-link-to-it nature of the web.
A business succeeds by dint of its crystal balls: the primary driver of business reporting is predicting the future. That is the impetus behind the emergence of ‘business intelligence.’ (Of course, producing future is only one aspect of intelligence. But that is another discussion.)
We need a model for that forecasting: we need a subject whose future behavior we are predicting; we need historical behavior data about that subject; we need a forecasting algorithm.
The most interesting practical subjects are generally not individuals, but populations with common attributes. Even for targeted ads and dynamically customized web sites, the segments of which the individual is a member are used for the forecasting.
Our data stores hold type hierarchies - levels of increasingly specialized information in hierarchically arranged partitions: ‘person’ partitioned on ‘gender’ to ‘male’ and ‘female’ subtypes. (Models by their abstracting nature lose some fidelity with the real world: even our simple example has no slot for hermaphrodites. But that is another discussion.)
Our type hierarchies are in complex relationships with one another. Those relationships may be from any level of the hierarchy to any level of another hierarchy (or multiple other hierarchies for n-ary relations.)
Those relationships in a given domain have to be modeled as a lattice, not a tree. Even a simple data model will quickly have multiple parent entities for the same child entity. (Look at the example of XML. For real world use, the XML tree structure must frequently be augmented by XPath, which enables us to dynamically navigate multiple parent relationships, and transforms, which are frequently used simply to rearrange the data around the ‘pole’ of a different parent. )
So the business goal is to identify a segment of the set of related entities about which to predict the future, where such segment is describable by a rule whose literal values are attributes of the entities and their relationships. (Data mining is simply the identification of interesting segments by inference from behaviors of larger populations which are randomly segmented until we find a segment for which there is a predicitive correlation with the behavior).
The practical challenge with doing this is that there are a number of conventions for physicalizing these related type hierarchies. These conventions are frequently used in an ad hoc fashion. Those creating the database may not even be aware of what they are actually doing, and so may not do it in a consistent fashion. Different conventions may be followed at different levels. Different conventions may be used to related different entities.
On any given level, our type partition may be physicalized using a descriptor attribute in a single table, such as ‘gender’, with values ‘male’ and ‘female’, with the space concession of having null values for non-gender appropriate attributes (dress size), or it might be physicalized as separate tables, Person Male Female, with the performance concession of creating joins (and in some circumstances outer joins) to get at our data, or it might be physicalized as a powertype, with Gender a table with rows Male and Female…
The point is, in order to create the rule to capture the logical segment of our data we need to work at the logical level. Reporting tools do not allow us to readily bridge that gap.
The next challenge is time sequencing the data. Most data warehousing efforts are really about articulating the temporal nature of data that was not originally well structured. If we look at the fundamental data types in a database, we have integers, floats, strings and time. But time is not like the other fundamental domains. Finding everywhere the number 3 occurs in our data is not very interesting. Finding everywhere ‘Bob’ occurs may be interesting. But finding everywhere January 3rd, 2006 occurs is definitely interesting.
Time in the database is best understood as a function of event occurrences, where an event of course is an interesting state change for an entity in one of our type hierarchies. Each time an attribute of one of our entities changes, or an attribute of one of our relationships, which themselves may be entities in their own right - marriage, for example - we need to capture the history of that event occurrence. That history includes the timestamp and the agent of change.
An additional level of complexity is that entities and their relationships may participate in life cycles (or more rigorously governed state machines) where the set of prestates and poststates for a given state exist as metadata. But frequently that metadata is not physicalized, or if it is is done awkwardly.
The problem with respect to reporting is that we frequently don’t have all the event history we need, at the grain we need it, to feed into our algorithm. When we restructure our data to obtain it, the data is frequently gappy.
Selecting the appropriate algorithm as part of our model is another challenge. Many managers can generate a trend line in Excel. Few of them know what it means.
Predictions based on gappy data need algorithms beyond the skillset and toolset of most mid-level managers.
As a result of all this, what has emerged in the reporting space - the current toolset - is at the high end a set of conventional models whose use requires the data to be rigorously structured in advance - think star schema, and at the low end, graphical tools that enable us to navigate and work with the physical database, whose logical structure we must infer when we are desiging our reports.
What is needed is a tool that dynamically reverse engineers the logical model out of a database, laying bare what is really there. The logical model may have to be finished by domain experts. Not all information needed to create the model may be present. The user examines entity behaviors, whose history has been captured in detail and is described by lifecycles and state machines, and selects a behavior of interest. The user examines the related type hierarchies, and selects a segment. (A data cube is a simple tool for doing this.) Then they can choose to predict the segment’s future behavior, and are guided through the selection of a model and algorithm by the tool, or they can select to mine that behavior for interesting sub-segment correlations, and are guided by the tool to select the attributes and algorithm (k-means clustering, etc) to use for the mining.
That is the tool that would revolutionize reporting and BI.