Implicit Evaluation with PHP

1 January 2007

Recursive Data Structures in PHP

Welcome to the Jungle

Today, I was helping a friend implement a recursive category model. There is a top level category and each category contains items and sub-categories.

It’s easy to illustrate on the web:

Somehow, your web browser can take a piece of text and while parsing, separate the bulleted items into lists and sublists. And it does this by recursively processing the code between matching tags.

Recursion is much easier to use than to implement. In fact, properly implemented recursion takes no thought on the part of the user. But there are things which really do require that recursion be implemented. And today, I’d like to show you how that happens.

The most important part of recursion is to have well factored code. Think of development in terms of tasks, not functions. And once each task is identified, each should be its own function. Using the above sample, you might pick out the following as a list of tasks:

It should be immediately obvious that more code is entailed in doing this than just looping over each set of results and populating the nested queries with results obtained from outer queries. Recursion is simply more complicated than developing under tight constraints. It is sometimes simpler than coding under arbitrary constraints, but this point shall go unexplained for now.

Now that we have a set of functions which each compose a specific piece of data, we decide how they will be processed. This choice is based largely on how the data is used. Given PHP’s artificial cap on recursive calls (and this site’s slight PHP-bias) we’ll use recursion with a managed stack. The alternative is for each call to getCategories to recursively call itself and call getItems to build a full data structure as the data is processed. Each has its benefits and time will teach you how to know how to recurse.

$results = (array)NULL;
$queueFetch = getTopCategories();
$ixFetch = 0;
while ($id = getSubCategories ($queueFetch[$ixFetch++])) {
   $queueFetch[] = getSubCategories($id);
   $results[] = getItems ($id);
}
function getTopCategories () {
return getSubCategories(0); /* top-level categories are just sub categories without a parent id */

}
function getSubCategories($parentId) {
   $sql = “SELECT id FROM Categories WHERE parent_id = ” . (int)$parentId;
   $rs = mysql_query ($sql);
   $results = (array)NULL;
   while ($row = mysql_fetch_assoc ($rs)) {
      $results[] = (int)$row[”id”];
   }
   return $results;
}
function getItems ($category_id) {
   $sql = “SELECT * FROM Items WHERE category_id=” . (int)$category_id;
   $rs = mysql_query ($sql);
   $results = (array)NULL;
   while ($row = mysql_fetch_assoc ($rs)) {
      $results[] = $row;
   }
   return $results;
}

You should notice that this code is not recursive. The data structure is, but the code is not. That is the advantage of a managed stack. Of course, this is a simple data structure — it is rather akin to the file/folder paradigm on an operating system.

A truly recursive example would have to call itself. This decreases its flexibility but makes code simpler.

$results = getTopCategories();
}
function getTopCategories () {
return getSubCategories(0); /* top-level categories are just sub categories without a parent id */

}
function getSubCategories($parentId) {
   $sql = “SELECT id, name FROM Categories WHERE parent_id = ” . (int)$parentId;
   $rs = mysql_query ($sql);
   $results = (array)NULL;
   while ($row = mysql_fetch_assoc ($rs)) {
      $id = (int)$row[”id”];
      $items = getItems ($id);
      $subcategories = getSubCategories($id);
      $results[$row[”name”]] = (array)NULL;
      foreach (array ($items, $subcategories) as $resultset) {
         foreach ($resultset as $result) {
            $results[$row[”name”]][] = $result;
         }
      }
   }
   return $results;
}
function getItems ($category_id) {
   $sql = “SELECT * FROM Items WHERE category_id=” . (int)$category_id;
   $rs = mysql_query ($sql);
   $results = (array)NULL;
   while ($row = mysql_fetch_assoc ($rs)) {
      $results[] = $row;
   }
   return $results;
}

The recursive example does the same thing as the queue-based example, but it considerably less flexible. You cannot retrieve particulars like just the particular subcategories of 4.1 with the purely recursive version without writing more code. The same is true of the first example and knowing the category name, however that functionality can be gained more easily than the subcategory subset. Requirements determine which to use. For very complex structures, the smaller functions will be easier to collect but assembling them may be more difficult (depending on how the relations are set up). Small numbers of complex structures might be more easily designed with pure recursion. Large numbers of simpler structures may also be better under pure recursion. There is no definite rule for determining the best option. If there was, recursion may not seem as frightening as it does.

This example is of more than basic complexity. If a subcategory had items or more subcategories, the purely recursive code would be much simpler, while the queue-based recursion would be about the same.

No Comments currently posted.

Post a comment on this entry: