Tuesday, January 16, 2007

Python and whitespace

I've introduced Python to many programmers, and most have had the same initial reaction: "What? Leading whitespace is significant? Oh, I ain't touching that." This has always struck me as odd. I have strong feelings about Python's use of block indentation, and this blog post by Paul Bissex last week reminded me of them. And so, here they are.

Many serious Python programmers will tell you using whitespace to delineate blocks is no big deal. I disagree. Lacking syntactically-significant whitespace is, in fact, a serious misfeature.

A bold statement, you say? Absolutely. And not one I would make without a strong argment to back it up. Here's the thing: every programmer I know uses indentation to indicate block structure. I know there are programmers out there who don't, but I feel comfortable stating that they are way, way in the minority.

If you accept it as given that the overwhelming majority of programmers use indentation to demarcate blocks, programming languages should treat initial whitespace as significant. Otherwise, it is possible to write code where the semantic meaning disagrees with the syntactic meaning, leading to bugs. Here's the canonical C/C++ example:
    if (something_is_true)
Many C/C++ programmers I know have made this mistake (myself included; in fact, I always wrap conditional blocks in { } nowadays for this very reason). Even if you are Joe Brilliant C++ Programmer and would never do anything this obvious and stupid, chances are good that the next guy who works on your code is not going to be as experienced or as smart as you. This mistake is not possible in Python.

By contrast, I have never heard a compelling argument not to have syntactically significant whitespace; most of the ones I've heard equate with "it's yucky". (Oh yeah, there's the usual bitching about spaces vs. tabs, and the "horror stories" of mixing the two. I have been using Python actively for about four years now, and A) I've been bitten by this "problem" less than a dozen times, B) it's always quickly become obvious what the problem is, and C) any reasonable editor can fix the problem in a matter of seconds.)

What it boils down to is a choice between distaste and correctness. And, frankly, having spent years maintaining code written by other people, I'll pick correctness every time.

Labels: , ,


At 7:40 PM, Blogger pbx said...

Thanks for the link!

In case you missed it, Steve Yegge in his most recent rant says: "We will never be able to make real progress in computing and language design in our industry until C syntax is wholly eradicated."

Of course, he doesn't really seem to like *any* concrete syntax...


At 8:55 PM, Blogger MrTact said...

I am a practical programmer by nature and by trade; I am more focused on creating software that can solve a particular tangible problem now than I am about advancing the state of the art in computing. So I'm not really the target audience of most of his discussion. However, this line really got my juices flowing: "The only way to make progress in the meantime is to separate the model (the AST) and the presentation (the syntax) in programming languages, allow skinnable syntax, and let the C-like-syntax lovers continue to use it until they're all dead."

At 11:33 PM, Blogger Mago_Ged said...

"A) I've been bitten by this "problem" less than a dozen times"

And I've never been bitten by:

if (something_is_true)

Leading spaces are _weak_. I constantly have problems with spaces lost in cut+past, tabs+spaces fighting togheter, etc etc in my day to day development (non-python). I'll never rely on a program that can broke for one, single, little leading space.

In my opinion, indentation is, and must continue do be, code representation sugar, not a significant part of the language syntax.

At 2:54 PM, Blogger MrTact said...

Leading spaces are _weak_.
I don't understand. It sounds like you are supporting my argument ("leading spaces are yucky"). Is there some computer science meaning of "weak" I'm not aware of here?

I'll never rely on a program that can (sic) broke for one, single, little leading space.
Incorrect whitespace in Python rarely leads to the type of logic error that I describe above. Because leading whitespace is significant, invariably it causes syntax errors, which break your *compile*. I'll take fixing a broken build over the possibility of a bug that compiles and might silently slip into production.

You say you have problems with whitespace in non WS-significant languages. I posit that if you were using something like Python that uses indentation for block delineation, you (and everyone working on your codebase) would quickly develop good habits with respect to this, just as (for example) someone coming from VB to C++ learns that case is significant. If you didn't, your code wouldn't compile! This is good behavior reinforced by the compiler; by not allowing you to get away with sloppy indentation, you decrease the possibility that sloppy indentation could hose you at runtime.

There, now you've made me say the exact same thing eight different ways :-)

At 9:04 AM, Blogger Ilya said...

I consider significant whitespace in python to be a major FEATURE.

But I think, it's unreasonable to pretend that the negatives do not exist (or somehow are irrelevant).

The negative side is simple:
the moment you step out of your development environment whitespace can easily get lost, changed or ignored, and lines get wrapped..

Email wraps lines. Wiki environments can squash leading whitespace. Proportional fonts make code look wrong. Cut-n-paste will break the code if indentation level is different. Even diff (if run with -w) can miss a significant difference in python code.

Basically, whitespace handling in python is a tradeoff: you get a superior code readability at expense of some code brittleness.

Both are valid issues and ultimately, it boils down to what's more important.

IMHO, I think readability is more important and thus python's whitespace handling is an a feature rather than a problem.


At 10:23 AM, Blogger MrTact said...

Ilya, I think that's a fair assessment.

At 6:15 AM, Blogger Luka said...

Personally, the fact that not one of the posted examples are syntactically correct speaks for itself as to why it's a bad idea.

At 6:17 AM, Blogger Luka said...

Uh, read that as "would have been correct if written in Python instead of C".


Post a Comment

<< Home