When asked about the meaning of life, one colleague of mine commented once that “I don’t know, but I know that nobody else knows either.” The same approach, I suppose, could be taken with respect to data-driven design: it isn’t clearly defined (good luck on Wikipedia), and actually means different things to different people. Consequently, the best thing I can do is describe what it means to me, thereby making it a kind of ‘Dmitri’s data-driven design’. Still, given the relative scarcity of information on this paradigm (that Wikipedia article is a good indication), I think every little helps.

The idea of data-driven design, for me, is simple: data-driven design is a paradigm in which some nebulous data (which is, predictably, stored somewhere) drives all, or a large portion, of an application. In other words, the application infrastructure is malleable, and aligns itself to the needs of the data. Here, we can specify two different patterns of application:

Static data-driven design is akin to code generation: you have some data (an abstract model), and this data is transformed before compilation to produce an application, or a significant part of it. As an example, abstract data definitions in XML can be transformed via XSLT/XQuery/T4/whatever in order to produce necessary stored procedures, entity objects/mappings, and the user interface. In this respect, of course, data-driven design appears to be synonymous with either code generation or metaprogramming, depending on how you look at it.

Dynamic data-driven design is when data continuously defines how your application looks. For example, if a UI is generated dynamically from a database at run-time, you’ve got dynamic data-driven design. Admittedly, the dynamic approach is more powerful since it allows you to project changes to data immediately, instead of when the application is recompiled. However, this approach is also a lot more complicated and requires more metadata in the data you’ve got.

I am an advocate of static DaDD, because generation implies a higher degree of malleability. For example, in a dynamic scenario you corner yourself into a particular persistence scenario, and it is very difficult to suddenly change the persistence method mid-project. On the other hand, if your application is a compile-time derivative of some abstract data specification, you get massive benefits in the form of an ability to retarget the application to whatever language or platform you see fit. In a similar dynamic setting, you would need to re-engineer the persistence layer, unless of course you started out with persistence ignorance in mind.

Dynamic DaDD is infinitly more complex because you get the interplay of various actors in the system that can also use their data to drive your software. For example, an engineer writes a formula that gets fed as input data to your system. Great, right? Except it doesn’t work that well, and a much better approach is to pre-translate the formula to a programming notation beforehand, so that any bugs or inconsistencies can be discovered before the whole thing is shipped to the customers. After all, how do you prove that mathematical data will drive your system correctly in 100% of cases? That’s right – you cannot.

Here’s an example of dynamic data-driven design: games scripting. The idea behind scripting is that it’s not the programmers’ business to create, say, in-game interactions, but rather the story designers. So, some of the tasks are offloaded to the storywriters, and the scripts they write are plugged in at either compile-time or run-time into the system. When I was young, I remember fiddling with the scripting of Icewind Dale, which involved using a strange language to determine the AI that drove my characters’ actions. For simple decisions it was fine, but for a very complicated logic system, its interpret nature ground the game to a halt. Nowadays, this sort of data-driven thing still works, but if you look at language workbenches such as Microsoft DSL Tools, you’ll see that they are mainly geared towards compiled code, rather than the creation of scripts and an in-code interpreter.

Here’s one example where dynamic data-driven approach makes perfect sense. Suppose you’re writing some sort of Excel integration package where you want end users (who are familiar with Excel) to keep editing their sheets as always, but interoperate with some heavier tech (e.g., BizTalk) in order to aggregate data. In this case, precompiling the formulae in the spreadsheets would be somewhat pointless, because the interaction of formulae on a spreadsheet presents a very complicated, and inherently dynamic scenario of execution.

In my opinion, a dynamic, purely data-driven application only makes sense in cases where the variability of options and the frequency of changes is such that the application has to be constantly adaptable to the end-users’ wishes. But if there’s a need in maximum performance or maximum support for features inherent in the chosen framework, then the precompilation approach is more advisable.