5 February 2007
What is Programming?
Programming is about specifying the operations necessary to complete a task. That’s really it. It does not matter whether you’re asked to compute the sum of 8 and 5 or asked to provide an artificially intelligent drop down menu suggesting cities based on a driving time of two hours or less from your current location obtained by a geographical IP lookup service. While one project is certainly more interesting than the other, they’re both just programming.
Different languages provide different fundamental development blocks. Scheme, for instance, gives you a core of maybe one dozen routines. PHP gives you closer to 3000. Each language is better suited towards different tasks but each is equally functional, as it is possible to implement PHP in Scheme and it is possible to implement Scheme in PHP.
PHP, as a language, gives you more features than Scheme. Ignoring run-time efficiency, Scheme could be more quickly implemented in PHP than PHP in Scheme. However, ignoring efficiency is a large concession. A very interesting comparison would be Is it easier to implement PHP under Scheme under PHP, or easier to implement Scheme under PHP under Scheme? I will readily confess that I do not know the answer, though I imagine using Scheme as the base language would prove easier.
This comparison is somewhat akin to the argument in CPU architecture about RISC vs. CISC. CISC implements more functions in hardware, at the expense of complexity (rather like PHP). RISC implements very few functions with much greater efficiency (like Scheme). Over time, CISC has become nothing more than a layer above a RISC core. This reality is largely responsible for my prediction that Scheme as a base language would be the better foundation.
Let’s expand the language comparison. For nearly the entire existence of digital computers, programmers have long hoped to create a way to express code in natural language. All of COBOL, AppleScript and to a lesser extent, BASIC aimed for the goal of letting regular people program in English. Rather then thinking in bytes or strings, you think “naturally.” The success of any of these languages is debatable, but most would probably agree that none have achieved their goal.
Part of the problem inhibiting the success of these projects is that natural languages are rather unspecific. As I am a native English-speaker, I shall assume English is the language that would be used. We have cases of the same word having different meanings (mine) different words having the same meaning (smile and grin). When reading or listening to an English piece, a large part of our understanding comes from context or non-verbals. Even in reading, things can be expressed in footnotes, parentheticals or contextually by introductions or pages earlier. We’re still not sure all of the time, and in those cases, we can research or ask for clarification.
Programming follows a backwards assumption. English starts with 100,000 words (source: my imagination). They exist and over time, an English-user familiarizes them-self with a word as they need it. They learn the type of word, the definition and implications of using it. Over time, one might learn 10,000 words. The other 90,000 account for so few uses that they can be adequately covered and defined by the 10,000 this person has already learned.
With programming, for instance with Scheme, there are twelve words. You must learn them all to be even partially useful with the language, and then define all other words using those twelve. If Scheme were to be implemented in terms of English, its core might be 100,000 subtly different terms. Since each term has a discrete meaning, knowing the difference between grin and smile becomes far more significant then in understanding the difference in real life.
Using natural language as the basis of programming is bad. By starting with a finite base, you can lose the assumption that something exists and will work as expected, and instead force people to learn what to use by its use, not by its name. Consider the terms map and word. In programming, a word is a 2-byte variable. In English, a word is a name for a defined concept. The two aren’t remotely related. Again with map: in English, it is a graphical document for locating things. In programming, map applies a function to each element of an array. If English is to be used as the language of programming, all existing concepts must be discarded.
One reason I’ve used PHP and Scheme as contrasting examples so far is that both have made movements towards this already. Both are loosely typed languages of some functional variety. Scheme may be much more elegant than PHP when programming functionally, but PHP is still capable of it. A language like C simply isn’t.
We’ve seen small steps towards comprehending English without understanding it with the advent of spell checking in the last three decades. Grammar checking has advanced much more slowly as it must understand words to make any kind of decision about them. A truly good grammar checker would make vast leaps towards programming in natural language.
Other smaller moves towards enabling natural language programming are highlighted in Wikis and other Internet phenomena. Wikis recognize links as links. Instant messaging recognizes // around words as intent to italicize or ** as intent to strengthen. HTML has moved from tags named B or I (bold and italic) to more functional names like strong and emphasis. This kind of contextual understanding is necessary to achieve natural language programming.
That’s not to say that natural language programming is completely absent from development today. Projects like Cake PHP have made some kinds of applications completely automated while barely touching code. Programs like Microsoft Access have taken grand steps towards automating the DBA-side of databases letting the designer focus on Design. But to date, we’ve seen no product that merges these two applications. There is nothing that lets you define, in English, the data to be collected (in English) while generating both an optimized data model and efficient code.
This, I believe, will signal the beginning of the next programming revolution. It’s not far out. A proof-of-concept application could be delivered in one week. A friendly, usable application is probably two to five years away. But by redefining the concept of natural language programming, it becomes much more possible. Users are no longer set in front of a blank canvas and asked to draw the tower they envision. They’re asked to sit in front of a computer and describe it. As the computer reflects what it’s understood, the user can clarify. And this accomplishes the elimination of a highly time consuming portion of development: translating English specifications into programming code.
When a programmer sits with vague specifications for weeks and delivers what the specifications define but under-deliver what the specification’s author imagined, it is the programmer’s fault. But when the specifications author bypasses the programmer entirely, the author has no one to blame but himself. And so he will clarify initially instead of on delivery.
Programmers will not cease to exist. They do afford flexibility to an otherwise strict setting. No matter how advanced a program becomes, programmers understand both code and natural language and will better comprehend. Further, as these natural-language development platforms advance, they will be specialized to need.
Programming is about specifying the operations necessary to complete a task. By segregating the things the language can figure out from concepts regular people understand, and mapping regular peoples thoughts back into the code the language understands, programming becomes implementing specifications. It’s been that all along, but for the first time, people besides programmers will understand it. Today’s programmers will adapt or change careers.
We’re on the eve of the biggest revolution in programming since the advent of the first high-level language. It hasn’t happened yet because people have tried to make English the programming language. David Wheeler once postulated any problem in computer science can be solved with another layer of indirection. The English interpreter simply needs to be another layer.

