15 December 2005
I Always Screw Up Some Mundane Detail
Most medium- or large-sized projects are multi-layered if not multi-tiered. There is an interface tier, a logic tier and a data tier. In an MVC-based application, the interface equates to the various views, the data tier is one component of the model and the logic is most of the controller. Building all the basic objects, to do data retrieval, modeling and storage is very mundane, and repetitive both for entities within an application as well as the task across multiple applications. In building an Event Calendar, for instance, you need some kind of a model for events themselves, another model for the calendar, and probably some kind of user model for authentications sake. Beyond that, you probably have a set of functions or objects to transform those models between language-specific representations and database representations.
Why is it all necessary? Principally, because SQL supports only the slightest bit more than the basic primitive data types. Ultimately, both objects/structures and tables are implementation-level entities. They exist to represent more complex data. However, while similar, these entities are not directly comparable: SQL, for instance, does not have anything akin to an array. Where a table COULD BE considered a class, the records are OBJECTS, not ARRAY ELEMENTS. The table as a whole could be an array of objects, but then the developer must construct a whole other layer of logic to map the containing objects array to the proper elements in that other table. Likewise, while an object full of primitives is easy to visualize as a tables, an object with member objects becomes drastically more difficult.
To simplify this process and reduce the number of logical conversion and dependencies a developer needs to address, they may resort to always working within the SQL mindset. They may give variables in code the same name as their Tabled fields counterparts. There may be foreign keys scattered all about the code. Dependencies are simplified but not reduced. However, the developer is still redundantly developing: they are designing a logic data layer (a conventional data model), a physical data layer (the database itself), a programmatic representation of the physical data layer (a model), some dependencies between related records (foreign key members), and the actual IO layer (SQL Select/Update/Insert/Delete queries).
The IO layer can be (and frequently is) generated by means of sharing field names [how often have you seen function select ($arrFields, $arrTable, $strWhere) { .. } ?], but it still relies on implementing the data model twice: once in SQL and once in code. And it enforces non-intuitive requirements for relating objects in code. Plus, it fails to address logical dependencies at all. And it forces management to differentiate inserts from updates onto the developer. Basically, there is a very large breakdown between the database layer and the program’s logic.
Why? Programming is supposed to automate repetitive tasks, not enforce them.
This last weekend, while working on my XQuery objects, I was hit by a revelation: the database is the detail, not the data. Its representation is insignificant. Modern object-oriented programming offers far more robust ways to represent data than SQL with its 1:1, 1:M, and M:N relationships. Coding to a table is like coding backwards, to the lowest common denominator.
The only advantage SQL can claim is easier reporting or filtering of data. Its shortcomings are much more numerous. It requires learning a whole other language, fighting non-intuitive data constructs and giving up advanced data types.
No matter what, a logical data model must be completed. However, a data model need not be SQL-centric. So in realizing that I can already offload reporting to XQuery, I set out to build a completely abstract data layer. It only took a day to build an ObjectDataAdapter, which, through the magic of introspection and implicit evaluation, serializes an object to a database and allows retrieval later.
Some developers may, at this point, realize that is nothing new. mysql_query(“UPDATE table SET object = ‘†. mysql_escape_string (serialize ($object)) . “’ WHERE intID=†. $intObjectId); has long worked, but does not allow querying the object. It is no more effective than writing an object to disk in a structured file. My approach is more SQL-like: I am matching member data to fields. Take an object car:
class Car {
var $make;
var $model;
var $color;
var $year;
}
class Model {
var $doors;
var $cupholders;
var $seats;
var $engine;
}
Car’s member Model is an object of type Model. In SQL, this is a M:1 relationship, but in code, one object is simply nested in the other. My ObjectDataAdapter will map Car’s make, color and year to fields in a table, and Model’s doors, cupholdeers, seats and engine to fields in another. It will then establish a FK in Model’s table to point back to Car’s PK. Strictly speaking, it is a denormalized data model, however, we’re relying on SQL only as a repository, not as the true query source. By the time we’ve returned to the XQuery layer, we’re dealing with objects again and the denormalization has disappeared.
My ObjectDataAdapter supports scalar variables as well as objects. It will create tables around an object, handle arbitrarily nested objects, handle relating fields, basically, everything serialize/unserialize does with the added benefit that almost all data is still queriable in SQL if need be.
In its current form, it does cheat ever so slightly. As of yet, I have not yet been able to effectively store arrays. I have successfully stored arrays of scalars by wrapping a primitive type in an object and relying on automatic relationship mapping, but it is very inefficient and doesn’t work for objects. For expediency’s sake, arrays are still uniformly converted to simple strings through a call to serialize and the special “arr†term added to that fields name signals select to unserialize it before re-assigning to the container model. That fields, and anything contained within is therefore not conventionally queryable. The other significant current problem is that all fields are stored as text in SQL. PHP implicitly converts variables to an appropriate datatype whenever they aren’t of a common type, but that means any variable is stored internally as a string at some point. It’s not a problem in PHP, but any strongly-typed language would require this to be fixed before the concept is useful.
Both issues will still receive my attention, time permitting. But already, I no longer need to implement a physical data layer, or an IO layer. The IO layer handles dependencies and relationships for me. And I need not use any special terms or naming conventions to make all this happen, besides consistently using intId as the PK (And even that is only necessary so I can somehow retrieve objects later). The IO layer returns objects, operates on objects and only refers to the Id publicly in one function: select. Names are all predictably computed at runtime, though they don’t technically matter so long as the IO layer can map between fields and members.
The only downside I’m aware of is reduced performance. Evaluated code (even implicit) isn’t as fast as tuned code, and the layer of abstractness the DataAdapter operates at will never match hand-written queries for each table. However, the time I save in writing data layers will never be accumulated with every second of lost run-time performance. Barring any major upsets, Moore will make the performance point irrelevant in the near future anyway.

