A carpenter is not usually aware of the hammer in their hand.   They are immersed in their environment.  They are engaged in the task at hand.   They only notice the hammer if it is too heavy, or if the head flies off, or if the handle is awkward in their hand.

When a tool is right for a job, the tool disappears from the user’s active experience.

Our goal in IT is to equip our business users with disappearing tools.

Our goal as IT architects is to anticipate the needs of the business accurately, and with enough lead time, so that we can buy – or build – and integrate tools into the business work experience…so those tools, too, can disappear.

This is the very definition of ‘aligned.’

A technology roadmap is the enterprise architect’s forecast of the information tools needed at certain times in order to support the business’s anticipated tasks.

Today’s rumination is about how to make such a roadmap.

I am a chess player.  Somewhere in the family store of dusty stuff there may still be the trophy I won for finishing second in the Indiana Junior (under 18) Chess Championship in 1977 (I think it was ‘77.)  Which I mostly only say to give some weight to what follows.  (And for what it’s worth I should have finished worse…in the five round Swiss system tournament I drew my fourth round game from a won position – connected passed pawns – and so did not have to play The Prodigy in the final round.   So I won my last round game, and instead of finishing 4-1 – The Prodigy would have eaten my lunch – I finished 4 1/2 – 1/2, good for second.  If I had known the outcome in advance I might have thrown the fourth round win to get the draw I ended up with.  I was saved from the moral quandry by ignorance of the overall picture.  And fwiw the memory of drawing with connected passers grates more than not winning.)

Aaaaanyway, one of the best chess instruction books ever is Jeremy Silman’s ‘How to Reassess Your Chess.’  In it, the International Master (the best teachers are not the best players – Silman is not a Grandmaster) outlines the Silman Thinking Technique.

Paraphrasing broadly, here ’tis. In your mind’s eye, take all of the remaining pieces off the board, leaving only the pawns.  Now arbitrarily place your pieces anywhere you want.  Can you find a way of placing your pieces that gives you a marked advantage?  Can you create a checkmate?  Can you create a preponderance of force against a key point?  Understanding the optimal placement requires not only an understanding of the rules of the game, it also requires an understanding of tactical patterns.   Is there a last rank mate by a rook?  A windmill?  A discovered check?

Having placed your pieces in an optimal configuration, you then need to figure out if you can achieve it…how long will it take to you get there from where you are at now?  And remember, your opponent will be working against you all the while.

Finally, given the constraints in your position – the slow to change pawn structure – and given your opponent’s best efforts to foil your plan, can you achieve your objective?  If so, rock on.  If not, go back to the beginning and make a less ambitious objective.

So what is your best plan?  And given that, what is the best first move to effect that plan?

Roadmapping is like that.

Keeping our chess analogy in mind, let’s look at roadmaps.

Remember we have said that a technology roadmap is the enterprise architect’s forecast of the information tools needed at certain times in order to support the business’s (an awkward possessive)’s anticipated tasks.

The business’s anticipated tasks obviously must be inferred from their goals and objectives.  Not so obvious, perhaps, is that they must also be inferred from the career expertise of the business users and from their training plans.  The tool an expert finds ready-to-hand and that of a beginner are not the same tool.  Tiger Woods plays with forged irons which permit precise control, but are not forgiving of error.  The amateur on the other hand plays with cavity back irons which assume off center hits, and minimize their impact.

So this is where our roadmap starts. We must work with the business to capture their goals and objectives.  There may be more or less work to do in this regard depending on the maturity of the business’s planning approach.  And we must work with the business to understand the nature of their expertise, their professional development plans, and their future expectations with respect to their tools.

From their goals and objectives we synthesize a picture of the business enterprise at some future date.  Providing the tools to support the business processes in that picture is our objective.  The full picture, including the tools, is where our roadmap leads.

We must pick a waypoint far enough in the future to make it possible for us to steer our ship toward that star.   Turning a supertanker involves holding the tiller hard over..and waiting, and waiting, and waiting for the ship to turn.  Turning an enterprise is even harder.  Project plans, with rare exception, will only take us out a year or eighteen months.   To make informed decisions about which projects to approve as aligned we need to go further out on the timeline.

If we get too far out, on the other hand, we obviously lose the ability to describe the expected future environment with any degree of accuracy.  In 1999 there was no MySpace, no Facebook, no Twitter.   But in 2004 all existed in some nascent form and a wise and careful observer might have projected their success with some confidence.

For the same reasons, existing corporate strategic plans are likely articulated over five years, or sometimes three years in rapidly changing industries.

So our first task is to take the business’s strategic plan (or if needed to help the business to create that plan), turn it into a to-be process picture, and figure out what the technology picture needs to be at that time to support those processes.

High level business use cases suggest themselves for the to-be process picture, following the familiar goal-driven approach espoused by Alastair Cook.  Which naturally leads us to describing our tools in the form of the ‘4′ in Phillipe Krutchen’s 4+1 architecture approach, where the ‘1′ is the use cases.

Having done so (ok, some handwaving here), we are ready to turn out attention to the roadmap.

We are not starting with a blank slate.  Not only do we have an existing IT infrastructure that represents our starting point, there are a number of constraints of which we must be cognizant and to which our roadmap, and our destination, must conform.

Nothing in the corporate existing or ‘as-is’ environment is fixed.  All is malleable.

But the cost and impact of changing a given feature varies.

Those things that are slower to change, and less likely to be changed, act as constraints on our changing other, more malleable features which are dependent on them.

Many of these are features that are managed in lifecycles, where there are changes punctuated with some rhythm or frequency:

  • Hardware->Depreciation.  Lease terms.  End of life of support.
  • Software->Licensing. Compatibility. End of life of support.
  • Contracts->Terms and conditions
  • Laws->Elections.

And of course Trends, Buzz, Fashions…those things with mind share in our decision makers.

We should model our constraints in terms of their speed of change, and the punctuation of their change, in order to determine…their contribution to the viscosity of our environment, and their projected influence at a given waypoint.  This approach to constraints should be part of the requirements assessment on every project.

In reality, of course, there is a continuous capability technology provides the business.

That technology evolves as new capabilities are brought on line, old ones are retired, problems are fixed, functions refined.

But as we have described, we know that our budget cycles are generally yearly.

So rather than an continuous roadmap, we should consider a series of one year roadmaps, each simply articulating the sub-goals, the sub-objectives, that add up to our overall goal.  We can then speak to the alignment of the slate of potential projects fielded for the next year.

This is arguably the path to an intelligent enterprise.  As we have seen in our earlier rumination on BI, the ability to create and work toward new sub-goals when faced with obstacles – including our evolving constraints -  is the very definition of business intelligence.

My wife and daughter and I like to play word games that we can do when we are out to dinner, or in the car on vacation. Howard Stern’s F.M.K. is a favorite, where one player names three people, and the active player has to say which one they would…make love to, which one they would Marry, and which one they would Kill. Fun game – I can’t think of another context in which I could not only get away with saying (about Cloris Leachman, as a member of some nightmare troika posited by my wife,) ‘well, I guess I’ll have to tap that granny ass,’ but have it be met with rolling laughter.

My daughter Zoe also likes to make up games – ever since she learned as a little girl that she who controls the rules controls the game.

In that spirit I contributed a new game to the family library last year. One person names three to five things, and the other players compete to see who can first name a category to which all the things named belong. For example, someone might say ‘vichyssoise, revenge, and beer’ and the category would be ‘things that are best served cold.’ A lame example, but then again, I usually lose. Trust me when I tell you there are wonderfully subtle and challenging lists.

Zoe asked me what we should call the game…and I thought of the seminal work on categorization by Berkley prof George Lakoff, with one of the best titles ever:  ‘Women, Fire and Dangerous Things.’

And that’s what we call the game now….’Women, Fire and Dangerous Things.’

So I was at a Barnes and Nobel in San Antonio, Texas a few weeks ago killing time on my way to the airport. I had to check out of my hotel at noon, but my flight back to Medford via Denver wasn’t until 3, so I had some time to kill. I still had a B&N gift certificate in my wallet from Christmas, so I decided to stop there on the way and find a book for the trip back home.

Maybe because I was away from home and missing my family I thought of Lakoff.  So I asked the bookseller if they had a copy of ‘Women, Fire and Dangerous Things’…not really expecting them to have a twenty-year-old cognitive science book.  But there it was.  Not only that, they also had Lakoff’s book on metaphor he did with Mark Johnson – subject of a future rumination related to Folk IT.  And both are now in my current ‘inner library’ – that floating set of a dozen or so books that never quite get shelved.

And today I was thinking about my previous posting, ‘On Intension, Extension, Wittgenstein and Exceptional Software‘, in which I argued that we should always make our type partitions incomplete so that we could accommodate Wittgenstein’s family resemblance categories, and I realized that Lakoff provides even more direction as to the nature of non-classical categorization that would be very useful to incorporate into our approach to modeling:

“The idea that categories are defined by common properties is not only our everyday folk theory of what a category is, it is also the principle technical theory – one that has been with us for more that two thousand years. The classical view that categories are based on shared properties is not entirely wrong. We do often categorize things on that basis. But that is only a small part of the story. In recent years it has become clear that categorization is more complex than that.”

“From the time of Aristotle to the later work of Wittgenstein, categories were thought to be well understood and unproblematic. They were assumed to be abstract containers, with things either inside or outside the category. Things were assumed to be in the same category if and only if they had certain properties in common. And the properties they had in common were seen as defining the category.”

But as a result of the seed planted by the later Wittgenstein about family resemblance categories, as well as later work by Eleanor Rosch,

our understanding of categorization has changed.

Rosch looked at two fundamental problems with classical theory:

  • If categories are only defined by properties, then no members should be better examples of the category than any other.
  • If categories are only defined by properties of their members, then the categorizers’ capabilities or attributes should have no bearing on the resulting category

Of course, as soon as one starts looking into it, Ms Elenor is obviously onto something profound.

What has emerged is something called ‘prototype theory.’  Thanks in large part to Lakoff, it has gotten all bollixed up in the Anglo vs. European philosophy,  objectivism vs. subjectivism debate spaces.   Which has unfortunately tended to marginalize it.

But for our purposes here, we should simply note this:

Our classical understanding of what a ‘Class’ is is simply wrong in many cases. There are classes which are constellations of other classes, not all the attributes of which which overlap.  And there are members of of a class that are best examples – not all members of a class are created equal.  As Rosch points out, a robin is a much better example of ‘Bird’ than an ostrich…or a chicken.

Our classical object-oriented approach to classes starts to fail as the users’ perspective is brought in.  One of the practical challenges of use case realization is in the data mapping…and we are starting to realize why.

A revision of object-oriented design tenets that accomodates prototype theory will help software more closely meet users’ needs in the real world.

On Folk I.T.

March 21, 2009

‘Folk psychology’ and ‘folk physics’ are two conceptual frameworks in common use in psychology and physics and related fields.  They refer to the ‘unscientific’ sets of everyday concepts that people use to explain and understand their interactions with other people and with the world.

There must also be a ‘folk computer science’ – a ‘folk I.T.’ – the conceptual framework that end users use to explain and understand their interactions with computers.  Understanding the nature of Folk I.T. might revolutionize the ability of IT shops to effectively capture and understand requirements, to promote new technology solutions, and to succeed in creating true business alignment.

Snapshots from Wikipedia (the links will take you to the cited passage):

Folk psychology (also known as common sense psychology, naïve psychology or vernacular psychology) is the set of assumptions, constructs, and convictions that makes up the everyday language in which people discuss human psychology. Folk psychology embraces everyday concepts like “beliefs”, “desires”, “fear”, and “hope”.

Naïve physics or folk physics is the untrained human perception of basic physical phenomena. In the field of artificial intelligence the study of naïve physics is a part of the effort to formalize the common knowledge of human beings.

Many ideas of folk physics are simplifications, misunderstandings, or misperceptions of well understood phenomena, incapable of giving useful predictions of detailed experiments, or simply are contradicted by more thorough observations. They may sometimes be true, be true in certain limited cases, be true as a good first approximation to a more complex effect, or predict the same effect but misunderstand the underlying mechanism.

Even people who start to become educated in the underlying sciences persist in keeping their folk systems of understanding.

In Steven Pinker’s ‘The Stuff of Thought’ he relates that “[m]any psychological experiments have shown that when people have a pet theory of how things work (such as that damp weather causes arthritis pain) they will swear that they can see those correlations in the world, even when the numbers show that the correlations don’t exist and never did…”

In ‘The New Cognitive Neurosciences’ Michael S. Gazzaniga and Emilio Bizzi summarize the folk sciences:

Folk psychology is our everyday ability to understand and predict an agent’s behavior in terms of irrational states such as goals, beliefs and desires. … Folk physics is our everyday ability to understand and predict the behavior of inanimate objects in terms of principles related to physical causality.

Folk IT is our everyday ability to understand and predict the behavior of software applications in terms of …. ?

I don’t have the answer.

But it is clear that rather than trying, point by point, to correct and educate end users about the true nature of software and computer systems, we should consider instead systematically capturing the world of their existing understanding…to capture the Folk I.T. framework by which, in which, they understand how software and computer systems behave.

Then and only then would we be in a position to successfully add to, to augment, to extend, their framework by integrating a new or changed component on their terms.

The use case actor remains either undefined or ill-defined.

As is well known, Jacobson’s use cases went mainstream in ‘92 without benefit of a rigorous definition.  While that choice was arguably key to their successful initial adoption, as their use has matured a consensus formalism (well, formal enough – call it a ’soft’ formalism) has emerged, driven first by Alistair Cockburn’s ‘Structuring Use Cases with Goals’ and later by the adoption of use cases in UML.

But that consensus has a significant gap that is still in my experience the source of endless debate and the cause of practical problems: the actor.

Closing that gap – getting at the true nature of use case actors – is the goal of today’s rumination.

There is simply not a common working understanding of what an actor is.  I have had very senior systems analysts tell me they really only did use cases for human actors – anything else was an integration problem.  As a partial result, that shop’s requirements docs confusingly include both SSA-style context diagrams as well as use case diagrams.  (Of course SSA’s ‘is it an internal or external system’ is simply the same issue in a different guise.)

It’s so bad that in a recent (2006) book on UML 2.0 (’Learning UML 2.0′ by Russ Miles and Kim Hamilton) the authors start to throw in the towel and say you’ll know an actor when you see one:

“Deciding what is and what isn’t an actor is tricky and is something best learned by experience.”

Then in some sort of pole shift they throw the reader a bone in the form of an actor-identification algorithm:

“Identify a Thing from your requirements.  Is it an actual person?  If yes, it is _probably_ an actor – but be careful, some people are a actually part of the system.  If the thing is not an actual person, then it is something you can change in the system’s design?  If yes, then it is probably _not_ an actor.  If no, then is probably is an actor.”

Sorta maybe kinda.

And a little further on…in a section called ‘Tricky Actors’ (you can’t make this stuff up):

“Not all actors are obvious external systems or people that interact with your system.  An example of a common tricky actor is the system clock.  The name alone implies the clock is part of the system.  But is it really?  The system clock comes into play when it invokes some behavior within your system.  It is hard to determine if the system clock is an actor because the clock is not clearly outside of your system. As it turns out, the system clock _is_ often best described as as actor because it is something you can’t influence.”

Come again? (It’s a red flag for me when what I’m reading starts to sound like an Abbott and Costello routine.)

And from Doug Rosenberg and Matt Stephens’s 2007 ‘Use Case Driven Object Modeling with UML‘:

“An actor is represented on the diagram by a stick figure and is analagous to a ‘role’ that users can play.”

“Actors can represent nonhuman external systems as well as people. Sometimes people are confused by this notion; we’ve found that drawing a ‘robot stick figure icon’ seems to clear this up.”

Evil Robots!  Cool.

In fairness to the authors – who I don’t mean to impugn, I’m just having a little fun at their expense – the Schadenfreude is yours, gentle reader – earlier, more authoritative, less florid works did not much improve on Jacobson.

From the OMG’s UML standard we take the following:

“An actor in the Unified Modeling Language (UML) “specifies a role played by a user or any other system that interacts with the subject.”

“An Actor models a type of role played by an entity that interacts with the subject (e.g., by exchanging signals and data), but which is external to the subject.”

“Actors may represent roles played by human users, external hardware, or other subjects. Note that an actor does not necessarily represent a specific physical entity but merely a particular facet (i.e., “role”) of some entity that is relevant to the specification of its associated use cases. Thus, a single physical instance may play the role of several different actors and, conversely, a given actor may be played by multiple different instances.”

That last one was pretty circular, wasn’t it.  Craig Larman, hedging his bet carefully, gives us an ‘informal definition’ in his Applying UML and Patterns from 2002:

“First for some informal definitions.  An actor is something with behavior, such as a person (identified by role), computer system, or organization ; for example, a cashier.”

And in a work citing Jacobson himself as an author,  ‘Use Case Modeling,’ the 2003 book by Kurt Bittner and Ian Spence and Ivar Jacobson , we find the prosaic ‘[a]ctors represent the people or things that interact with the system; by definition, they are outside the system. We focus on the actors to ensure the system does something useful.”

Cockburn – if Jacobson is the father of use cases Cockburn is the oldest son – says in his seminal ‘Structuring Use Cases with Goals‘ that

“[a] ‘’primary’’ actor is one having a goal requiring the assistance of the system. A ‘’secondary’’ actor is one from which the system needs assistance to satisfy its goal. One of the actors is designated as the system under discussion.

Each actor has a set of ‘’responsibilities’’. To carry out that responsibility, it sets some goals. To reach a goal, it performs some actions. An ‘’action’’ is the triggering of an interaction with another actor, calling upon one of the responsibilities of the other actor (see also Figure 3)[not shown here - Ed.]. If the other actor delivers, then the primary actor is closer to reaching its goal. The entity-relationship diagram corresponding to this is shown in Figure 2.

2316

Figure 2 says that there are internal and external actors. An external actor can be a person, a group of people or a system of any kind. The internal actor may be the system in design, a subsystem or an object. The system in design consists of subsystems, which consist of objects. Actors have behavior(s). The top-level behavior is a responsibility. It contains goals, which contain actions. An action triggers an interaction. The interaction is one actor’s goal calling upon another actors (or its own) responsibility.”

This is much better…although a goal calling on a responsibility doesn’t really work, does it?  The previous authors have attempted to identify an actor by listing its attributes – a dog is furry, has four legs, and barks.  But we need to understand something like an actor by placing it in a context, such as Cockburn has done.  But is has to be a logical context, one that makes sense.   And as we’ll see, Cockburn’s is both incomplete and inaccurate.   And we are completely missing an understanding of an actor’s essense.  For better or worse, our Folk IT understanding (the subject of the next rumination!) is based on essences.

So what do we have so far?   A collective definition synthesized from the aforementioned sources would be something like this:

“An actor is an person, system, or clock external to the in-scope system with goals which require them to engage in behaviors interacting with the in-scope sytem to achieve.”

Let’s analyze this.

First, can a system really have a goal?  Of course not.  Systems are tools…no matter how sophisticated (and leaving the Strong AI debate for a different day), they have purposes, not goals.  A cup is designed and built to achieve the user’s goal of having a drink.

So if a system can’t have a goal, how can a system be an actor?  It can’t, at least as described by Cockburn’s framework.  So the framework fails.

Who or what has goals, and what is their relationship to the system in question?

Simply put, humans, both singly and in groups, have goals.   The pursuit of a goal is an act of will.   The ability to achieve goals in the face of obstacles is intelligence.  The pursuit is a moral act – the agent is responsible for the outcome.

This is so fundamental it is reflected in language itself.  In Pinker’s ‘The Stuff of Thought’ we find this:

“Linguists sort verbs into classes, each called an ‘Aktionsart’, German for “action type,” based on their temporal conture….Each of the four major action classes, state, activity, culmination, and accomplishment, smuggles in a concept of will….Activities and accomplishments are voluntary,  state and culmination involuntary….Indeed, with an accomplishment it is the actor’s goal that determines the exact event that consummates it.  Once again, this is not just a fine point of grammar but a key to our moral sense.”

So the fundamental actors are those humans and groups of humans who can exercise will.  We shall call them ‘volitional entities.’  And the fundamental relationship of volitional entities to our systems is a moral one – they are responsible for the actions they take with the system to achieve their ends.

The practical relationship of an actor to an in-scope system is one of communication.  There must be some direct chain of events from a physical action – clicking a mouse, pushing a button, inserting a card, speaking a word – that triggers a state change in the in scope system.

When a volitional entity is a person, they may interact directly with the in scope system.  But when a volitional entity is a group, they have no direct ability to interact…a company doesn’t have a voice, or fingers.  So they have to communicate indirectly.

So what are system actors?   When a system is an actor, it is only so because it in turn has been driven by a volitional entity who has triggered its actions – or by another system which in turn was put in motion by a volitional entity, etc.    So the volitional entity is interacting indirectly with the in scope system – so in a sense the in-scope system is simply acting as a component of a larger system – and the volitional entity, the primum movens, is still responsible for the actions of the in-scope system.

A clock actor simply refers to a system actor whose purpose is narrowly to express the deferred will of the volitional entity as to when something occurs.  For actions whose period is arbitary, such as daily batch jobs, the will may not be just the timing, but that is happens at all, that is, the clock is a way to express the will of the group.

A rule actor would be the more general case, where the volitional entity has codified their will to have particular things happen in particular circumstances.  Unlike the clock actor, whose triggering domain is just the clock, the rule actor’s domain includes a larger set of states, usually including some business subdomain.

Ok.  We seem to have made some progress.  Let’s recap.

A volitional entity is human or group of humans – Fowler’s ‘Party’ – which can attempt to achieve goals by interacting with systems.  They are morally responsible for the outcomes.

A volitional entity’s will can be exercised in pursuit of a goal with respect to a given system either directly or indirectly.

When it is expressed directly, the volitional entity is a human engaged in physical communication with the system.

When it is expressed indirectly, some other system is engaged in physical communication with the in scope system.  That system in turn has either been directly or indirectly engaged by a volitional entity.  The primum movens, the ‘first mover,’ will always be either a human or an event-triggering system that has been programmed by a human or group to initiation an action according to a set of environmental rules – at a given time, or at a give barometric pressure, etc.

So a use case actor is an object which represents a direct or indirect volitional entity’s communication with an in-scope system.  A complete use case picture would have at its periphery only volitional entities, which are both the primum movens and moral owners of any system behaviors.

Models are useful abstractions, but the regularity and perfection of these idealizations doesn’t exist in nature. But there are laws of nature that things obey, and models are our entre to understanding and using those rules. At least at some coarse-grained level we can predict the future of real world events based on rules we have derived from a model-based understanding of real world things. There are no perfect circles in nature, but we can still calculate the circumference of a circle-like-thing with useful accuracy.

Software architecture is building and using models of software to help us understand the reality of software.

Software itself is based on models. But insofar as one of the primary types of data we deal with is indicative, data that indexes and describes something in the world, it is important for us to remember, as Rene Magritte famously reminds us, a picture of a pipe is not a pipe. And a record of a person is not a person.

What I’d like to explore in this rumination is the gap.

Underneath both the relational model and the object-oriented model, the two dominant models in software today, we find Zermelo-Fraenkel, or ZF, set theory.

There are two deterministic ways of describing the members of a set, intension and extension. Intension applies rules to a population to carve out set membership. Red-headed step children is an intensional set. Extension simply lists the members: the world’s greatest guitar players are Paco de Lucia, Pat Metheny, John McLaughlin, and Eric Clapton. (Ok, maybe not the same as your list.)

But, as Wittgenstein pointed out, humans have the ability to categorize, to make sets, based not on explicit rules but on family resemblances. We can include or exclude things in sets without applying exact rules. These conceptual categories are how we interact with the world.

So when we design and build our model-based, rule-based, logic-based software, we leave out stuff our users _know should be there_, things they _know should be included_.

How do we deal with the gap?

I don’t have a fully formed idea. But I think it may be to always build extension into our model of set descriptions to complement intension…to always have a way that users can say, ‘I know your software says birds fly, but a penguin is a bird, so just deal with it.’

This means no complete partitions. If you partition people on gender, and your subtypes are ‘male’ and ‘female’, build your software to enable users to extend it so it will accommodate the real world where we have hermaphrodites.

These aren’t exceptions. We should not have to deal with them as such. It is our software that is imperfect, not the world.

Chris Date is one of the most prolific and influential writers on database topics. His ‘Introduction to Database Systems,’ now in its eighth printing, is the gold standard. Having a dog-eared copy of an early printing is a badge of honor among db types.

Chris Date is also an opinionated cuss. His introduction of new terminology (quick, what’s a RELVAR?), and his sharp opinions presented in his instantly recognizable, slightly supercilious style tend to polarize his audience. So taking pot shots at Date’s opinions is a common sport in the industry. But Date is smart as a whip, and his acolytes are legion, so if you are going at him you’d better bring your A game.

My turn :) .

Date first described something he called ‘The First Great Blunder’ in a paper titled ‘Introducing…The Third Manifesto’ he and his frequent co-author Hugh Darwen wrote back in ‘95. The ‘Blunder’ struck me as wrong at first reading. But it was wrong in some low level conceptual way that was, that has been, difficult to articulate. And Date, who seems to be the most fond of his opinions when they skewer some conventional praxis, has put references to said Blunder in many of writings since then – not only the full book version of ‘Third Manifesto,’ but into updates of ‘Intro to Database Systems’ as well – so that it is In My Face All The Damn Time.

So the time has finally come to drive a stake through Date’s Blunder, and show that it is really a case of the First Great Blinder. And the refutation will take us through the pretty interesting territory that lies under databases and data models – the land of set theory, logic and identity – and I think we’ll end up not only with a refutation of Date, but also an interesting conceptual perspective on the world of object-relational mapping.

Ok, so what is The Blunder? The Blunder comes at the intersection of The Relational World and The Object World. Date suggests the key question in object-relational mapping is this: ‘What concept is it in the relational world that is the counterpart to the concept object class in the object world’? And he characterizes the most common answer, mapping a table (in Date-Speak a relational variable or ‘RELVAR’) to a class, as The First Great Blunder.

Date understands that that mapping is a very natural one to make. If we look at the DDL for a Person table

CREATE TABLE PERSON {
FirstName VARCHAR2(20),
LastName VARCHAR2(20),
Gender CHAR(1),
Birthdate DATE
}

and the (Java) code for a class definition

class Person {
String FirstName;
String LastName;
char Gender;
Date Birthdate
}

it looks like a no-brainer. What could be more obvious? So what problem could Date have with this?

The answer requires a longer explanation of the relational world. Date’s vocabulary is full of ‘domains’ and ‘attributes’ and ‘tuples’ and ‘relations’ and ‘relvars’ instead of ‘data types’ and ‘columns’ and ‘rows’ and ‘resultsets’ and ‘tables.’ I’ll try to use the latter vocabulary so that those who haven’t had the KoolAid can follow without a dictionary, and those of us who found the KoolAid too bitter to swallow can follow without doing the internal simultaneous translation of the non-native speaker.

Date believes the correct mapping is not from class to table, but from class to data type.

Sounds like another no-brainer. A class is a data type.

The natural answer given our two no-brainers is that a class can map to a data type or a table. So the key question to understand Date is why does he think a class can’t map to a table?

Because he believes a table is a variable, and a class is a type. And that a class has methods and no public instance variables, whereas a table has public instance variables and only optionally methods.

Because he believes the fundamental value of an instance of a table is the set of all rows – a resultset or Date’s ‘relation’ – and the fundamental value of an instance of a class is a single object.

A couple of threads are entertwined here that we need to tease apart. The first is the database side distinction between data types and tables – between ‘domains’ and ‘relvars.’ The second is the single instance to set question.

What Date has missed on the database side is that the distinction he is drawing between data types and tables is an implementation distinction, not a conceptual one. And the important conceptual distinction he needs to visit and does not is that between entities – those instances that have identity – and non-entities. His mistake – the First Great Blinder – comes from not opening up his mind wide enough. No no no no he’s ou-out side, looking in…

Taking our inspiration from Jorge Luis Borges’ short story ‘The Library of Babel‘, let’s imagine in place of each of the data types we have a table holding all possible values of that data type. So for integers, we have a table ‘Integer’ with two columns, ‘SurrogateKey’ and ‘Value’, and an infinite number of rows, one row each for each integer.

“A moment” says the perceptive gentle reader. “What data type is ‘SurrogateKey?’ And more importantly, what data type is ‘Value?’”

The type of SurrogateKey we shall leave undefined except to say it is a unique identifier with aleph-null possible values. And the type of ‘Value?’ Why, integer, of course, says the White Rabbit. In infinite regress. But that is in the nature of the definition of number. And for each possible width of VARCHAR, such as 1 or 223, we have a table such as VARCHAR_1 or VARCHAR_223 whose columns are ‘SurrogateKey’ and ‘Value’ where the datatype of Value is ‘VARCHAR(1) or ‘VARCHAR(223). And the rows contain all the possible permutations of all the valid characters for VARCHARs. So that our table VARCHAR_1’s rows might be

KEY VALUE

1 a

2 b

etc.

And EVERYDATE is a table with two columns, ‘SurrogateKey’ and ‘Value,’ and its rows contain all possible date values. (Those of you who have designed data warehouses may have just had an ‘A Ha’ moment.)

So now, at our database design level, our tables consist only of sets of foreign keys with constraints binding them to other tables.

And our Person table definition becomes

CREATE TABLE PERSON {
FirstName mystical_surrogate_key,
LastName mystical_surrogate_key,
Gender mystical_surrogate_key,
Birthdate mystical_surrogate_key,
CONSTRAINT fkFirstName FOREIGN KEY (FirstName) REFERENCES Varchar_20(SurrogateKey),
CONSTRAINT fkLastName FOREIGN KEY (LastName) REFERENCES Varchar_20(SurrogateKey),
CONSTRAINT fkGender FOREIGN KEY (Gender) REFERENCES Varchar_1(SurrogateKey),
CONSTRAINT fkBirthdate FOREIGN KEY (BirthDate) REFERENCES EveryDate(SurrogateKey),
}

Ok. So now, in place of the surrogate keys just imagine that each actual value of a datatype, each varchar, or each integer, is not a literal value, but a foreign key to a table holding all possible values of that datatype…but because those tables have no other data, only the single columns, we never instantiate them explicitly. We can make direct references from one key value, such as the integer 23, to the same key value, 23, anywhere they occur.

The long-winded point here is that the distinction between data types and tables is not a necessary one, not a base conceptual distinction. There is no difference. The distinction is a practical one, one of convenience. All there are at the bottom are sets and sets of sets. It’s turtles all the way down.

The First Great Blinder is Date’s attempt to map from one implementation to another without reference to an underlying conceptual model. This is the most common error made in integration projects. We map one tool’s export file directly to another tool’s import file, and, rather than creating a common logical domain model to guide the mapping, we let Junior Programmer do it in their cube. And inevitably we end up with mapping errors.

The more important distinction we should be making in both realms is between entities and non-entities. An entity has identity. It is unique in its space. We describe an entity with a set of attributes (which as we’ve seen are sets of sets of sets of …), just like we describe something without identity. So what is the difference? An entity has continuity over time. It has a continuous existence. This notion is fundamental to OR-Mapping, not the distinction between data type and tables.

I’ll have a lot more to say about the nature of identity in another long-promised rumination.

The second mistake Date makes is to use the single table and single class as his canonical example for the mapping. As anyone who has ever worked in the trenches of OR mapping, or model-driven architecture tools, knows, single tables and single classes are just simple cases of the much more useful, and _more fundamental in most important ways_, relations – not Date’s relations but queries written to access joined tables, and – and there does not seem to be a good word here – graphs of related classes.

What I’d like to point out here – I am wrapping this up for now, but will try to flesh this out more soon – is the important mapping is not happening at the individual class or table level. Date make an error in characterizing the mapping of the relational world to the object world with a the mapping from a single class to a relation. That’s obviously not a rich enough example. And the people who have actually designed and used the tools of OR mapping know what’s right.

So, Mr. Date, here we are. The First Great Blunder is not a blunder at all. It is a partial truth. And your preferred mapping is but another piece of the same truth. You asked one of the key questions, but you were simply wearing the First Great Blinders and couldn’t see the arena where the full truth plays out. To a hammer everything is a nail.

A class can map to a data type. A class can map to a table. Your tuple – a row of a table – can map to an object. An instance of table data – a resultset, your relation – can map to a set of objects, generally in some container managing access to them.

But in the larger arena, these are the fundamental mappings:

A resultset (Date’s relation) maps to an object graph.
A query definition (Date’s relational variable) maps to the metadata description of an object graph – let’s coin an expression and call it a class graph.
A row of a resultset (Date’s tuple) maps to a set of individual objects in an object graph connected by pointers.

The fundamental importance of the object graph has become more and more clear over time. It has emerged at the heart of Service Data Objects, which are object graphs that can be disconnected from the mothership, modified, and phoned home.

More on object graphs soon.

Use cases are primarily used to make software for knowledge workers.   Sure, they are abstract enough and flexible enough to describe users’ interactions with a variety of systems.  You could – and perhaps people do – write use cases for driving a car. But most of the time when you see a use case, there is software in the offing.

And what exactly do knowledge workers do?  That’s easy.  They make decisions in the area of their expertise.  They are immersed in a situation made intelligible by their experience and training, and apply their knowledge and reason to make decisions which result in actions.

A user sitting in front of a software screen can only select things.  (We can take care of edge cases by describing writing text as selecting one text out of all possible texts, and by describing drawing with a paint program as selecting one picture out of all possible pix, etc.)

Each selection represents a decision – check the box, set that radio button, select this item from a combo box – and the little decisions are rolled up into bigger ones: save, start, do.

There is a range of complexity and sophistication of decision making, and a wide range of the type and depth of expertise a user has to bring to the arena.   At the low end, there are binary decisions in simple spaces – open a file? ok.  At the high end, there are sophisticated tools that can be used to help analyze complex situations using queuing theory, game theory, probablility, simulation, and other formal techniques.  More about the nature of decisions in a future rumination.

So what is the true nature of a use case?  A use case is a decision context.  It describes an expert engaged in a decision scenario.  The overall goal of the use case is reached by a series of such decisions.  The expert needs to be immersed in the environment where the decision is possible.  They have to be provided with the information, the research, on which to base their decision.  And they need to be presented clear and correct selections that correspond to qualifying and making the decision.

As always, more to come.

Requirements are frequently uncategorized or poorly categorized. Unfortunately it is not unusual to have requirements captured as an undifferentiated list in a spreadsheet, or a shallow bulleted list in a word doc.

There are a number of useful ways of structuring requirements. We’ll touch on one intially, then augment this over time.

One very useful categorization is by the frequency of change or possible change.

The laws of physics, our understanding of how the physical world works, are fundamental requirements for all projects, even if frequently not articulated.   If there was a big bang, there may have been some change near the bang, but it is safe to assume the laws of physics will be stable for the lifetime of one of our software projects.  Sometimes our understanding of them can change, though  – if x number of objects of a given size fit in a particular box, that is unlikely to change, but the invention of a new lattice packing algorithm might do the trick.  Or making all the chips the same size and shape so they stack in a tennis ball can.

Federal and state laws are generally slow to change.   But they do, and if our project is potentially impacted by such changes during its lifespace, we should take the possibility of change into account.

Contracts with our business partners change at the end of the contract period – or sometimes sooner if there is a dispute.

Our technical infrastructure changes.  New technologies are introduced and adopted.

Business objectives change much more frequently, driven by changes in the market or  changes in leadership.

Categorize the requirements by their frequency of possible change.  Qualify that with the period of the change and its nature.  Find logical gaps in the resulting picture and fill them.  Then account for the temporal volatility of requirements – you are designing for the lifespan of your system, not just for the snapshot of it when it rolls out.

Why have HTML and XML succeeded? I believe it is a combination of four things:

Accessibility

There is almost no barrier to entry. Notepad and IE and you’re in business.

Resiliance

Tolerance for error. Most mistakes you can make in HTML won’t prevent your page from displaying. Unlike most programming languages, HTML will let you be and remain imperfect and continue to make progress. XML is tolerant in a different way. The syntax is more rigorous, but also very, very simple and regular, so XML is tolerant of conceptual errors, which, again, don’t prevent you from making progress.

Broad tiers of utility at different skill levels

You can do a lot of HTML and know nothing of cascading style sheets. But once you know them, there is a rich set of new things you can do. You can learn a small set of things, and get a lot of mileage from the knowledge. With each tier, each quantum of skill, there is a broad new plateau that opens.

You can go deep

This is a corollary of the previous point. From HTML there are cascading style sheets, and Javascript, and you can move to the server and find Linux Apache MySQL and PHP, and there is almost no end to how deep. From basic XML there Data Type Definitions and then Schemas, there are XSLTs and XPath and again, a long path with almost no end. And again, for each step, there is a broad range of new function that is available.

So what is the lesson? Keys to success for a new development technology:

  • Give users a low barrier to entry using tools with which they are already skilled.
  • Give them a broad range of utility with the first step in skill.
  • Give them visible success depite their errors.
  • Give them a new plateau of function with each stepwise gain in skill.
  • Give them a path to go deep.

I replenished my popcorn supply today, and my interaction with the checker at the grocery store set me to thinking about how frequently the software requirements that we capture miss the point. And she only asked if I was going to use a hot air popper.

I regularly buy Western Family white popcorn in the two pound bag. The original driver was economy. I am a weekly commuter. I maintain two residences: my permanent family home is in the Rogue Valley in Southern Oregon, my pied-a-terre is a studio apartment over Phil Cusick’s frame shop in downtown Newberg, Oregon, about a half hour drive southwest of Portland. I try to keep the bills for the work side as small as possible, and to favor the home and family side. The financial Stoicism makes me feel virtuous – old school virtus with a modern gender twist – during the week as I knead dough and bake bread, or pick pintos and soak them overnight to have them ready for the crockpot the next morning.

The path of fiscal conservatism led to popcorn. I like salty snacks. But chips – potato, Frito, Dorito, kettle – are expensive. Pretzels are ok, but I tire of them quickly. I found microwave popcorn at WalMart for less than three bucks for a box of fifteen individually wrapped Pop Weaver’s.

And it was ok. But one week even WalMart was too dear. Western Family (northwest store brand) white popcorn to the rescue. $1.65 for a two pound bag, which I find lasts for about as long as a box of Pop Weaver’s.

And the thing is, it is way better. I don’t have a hot air popper in Newberg. I’ve used hot air poppers with bulk popcorn, and they pop great, but the popcorn isn’t very good – it comes out kind of tasteless. I’ve microwaved a lot of popcorn, and it is just ok at best. Popcorn lung and the diacetyl-whatever-is-now-gone notwithstanding, it has a faint chemical aftertaste from the fake butter and whatnot.

Movie theater popcorn, when it is fresh, is pretty good. And the Jiffy Pop I had as a kid was yummy if it was cooked well. If you are as old as me or nearly, you may remember Jiffy Pop: a foil pan filled with popcorn and oil under a cleverly spiral-folded foil cover. You (my first instinct is to use ‘one’ instead of ‘you’ but Donna Meek in high school accused me of never using slang and I’m still seeking her approval decades later) cooked it on top of the stove, and the popping popcorn swelled the foil cover until you had a foil globe full of popcorn. It was easy to burn, and took some technique, so, like the barbeque, it was Dad’s job.

The Western Family white popcorn was a revelation. I don’ t have many pans in my bachelor pad in Newberg. I used an aluminum Dutch oven that we bought at a church rummage sale from the pastor’s wife who lives two doors down from us. I followed the directions on the bag: put a little oil in the pan, put in a few kernels of corn, and heat. When the first kernel pops, add 1/2 cup of corn, cover, and shake regularly until popping subsides. Take it off the heat, season and eat.

Revelation. Roasting the kernels in the hot oil gives the hulls and in turn the whole popped kernel a wonderful toasted nutty flavor. And the Western Family popcorn kernels pop up big and fluffy – the ratio of fluffy white stuff to hull is very good. As opposed to the national brand Jolly Time, both white and yellow varieties, which I’ve tried since, which always seems to pop up meanly.

Once popped, you can flavor the popcorn with the classic butter and salt, or my daughter’s favorite garlic and dill, or whatever – it is a wonderful flavor medium.

Ok, so what does this have to do with software requirements?

Money and time (which equals money) are the low-hanging fruit of requirements. They are easy to quantify. Every company has accountants. Every company has a bottom line.

In the world of popcorn, time and money get you hot air poppers and microwave popcorn.

And I’m sure the engineers at Orville Redenbacher and the like enjoyed rising to the save time and money challenge.

To pop popcorn, you have to get it hot – above a certain minimum heat, below which it won’t pop, but below a certain maximum heat, at which it will burn, and keep it there for a certain length of time. It doesn’t all pop at the same time, which makes the temperature window even smaller for ideal results. When it pops it expands in volume fifty-fold (or somesuch). When it pops it jumps. How can we contain the expansion, or leverage the motion? You can just hear the gears in the popcorn engineering geeks whirring.

But the fundamental requirement for popcorn is that it tastes good, and that it is pleasurable to consume. I am willing to spend ten minutes popping bagged popcorn on the stovetop in a Dutch oven because it tastes good. (The careful reader will point out that I only discovered it because it was cheaper. True, but I don’t believe that fact detracts from the overall power of the popcorn point, which is upon us.)

Here’s the point. When we have a list of requirements we’ve gathered from the business, their prioritization and resolution should not be driven solely or even primarily by short-term cost and technology.

Taking a business process and finding a new automation boundary where we can deploy a gizmo that will save enough money to get the sponsor a tee time with the CFO is the lazy way out.

Taking that same business process and finding a new satisfaction boundary is the more rewarding path, for our employees, customers, and ourselves. The ROI is much harder to measure – but it is there.

Many software projects will deliver microwave or hot air popcorn to the business. And the business just doesn’t know they would really prefer Western Family white popcorn cooked in hot oil and flavored with chipotle, chives, and garlic until they’ve had a taste.