Archive for April, 2005

Under the Hood

Friday, April 22nd, 2005

Imagine that you work in a large and somewhat old-fashioned office building. If you want to send a message to your buddy Joe over at XYZ Corp, this is how it goes. You write out your letter on a piece of paper and put a sticky note on it saying, “Please send to Joe Smith at XYZ Corp,” and hand it to your secretary. She (I said this was an old-fashioned place) puts the letter in an envelope and puts Joe Smith’s name on it. Then she looks up the address for XYZ Corp and writes that on the envelope, along with your return address. Then she hands it off to the guys in the mail room.

What they do is interesting. They look at the address for XYZ Corp and say, “Hmm… that’s out of town. It needs to go to Central.” So they put your envelope inside another envelope and write “Central Post Office” on it.

When it gets to the central post office, they open the outer envelope and read the address on your letter. They say, “Oh, this is going to Chicago,” or wherever. So they put your envelope inside another envelope and write “Central Post Office, Chicago” on it.

Then it gets to the Central Post Office in Chicago. They open up the envelope addressed to them and see the address for XYZ Corp. So they put it in another envelope that just has the 9-digit zip code for the XYZ Corp building on it.

It shows up at XYZ Corp, and the guys in their mail room open up that envelope and see that it’s addressed to Joe Smith. Somebody runs it upstairs to Joe’s secretary, and she opens the envelope and hands Joe your letter. When Joe sends a reply back, it works the same way.

This is how the Internet works.

It’s actually more complicated - there are more middlemen - but that’s fundamentally how it all works. It’s all these little letters (called “packets”) flying around a very, very fast postal system. This is a pretty clear match for email, but it’s also how everything from web pages to streaming video to Voice Over IP works.

When you “go to” a web site, you’re really mailing out a request for a web page. It’s like writing off to a mail-order catalog company. There’s a standard form that defines how you ask for web pages. You fill it out and send it in. You’re sending this little form that says, “I want to see http://www.spuriouspundit.com/index.html”. That request goes out through this metaphorical postal system to the spuriouspundit.com server, and some little toiling minion there xeroxes off another copy of the index.html document and mails it back to you.

Like I said, it’s more complicated than that. How do you keep email and web pages and FTP sites all running on the same machine without tripping over each other? Imagine your office building has a bunch of different departments in it, but they all share the same mail room. Instead of a billing department and a sales department, you’ve got an email department and a web department and an FTP department. The way it works is that they each have different post office box numbers. Whenever a letter comes into your building, the mailroom guys just have to drop it in the right box. One of the rules of this postal system is that the addresses on letters have to have a box number. Furthermore, these box numbers are standardized, so that box 80 is the normal box number for the web department, box 25 is for email, etc. So when you send off your web page request, you know that it’s a web page request, and wherever it’s going, it should be going to box 80.

This is also how you keep your responses straight. Even if you’re the only guy at your company, you could be downloading a couple of mp3s in the background while you’re popping up new browser windows right and left. If all that stuff is landing in the same inbox, you’ll never sort it out. So each time you ask for a web page, you set up a new post office box just for its responses. Your downloads go to boxes 5001 and 5002, and your web pages end up in 5003, 5004, and so on.

Here’s the next wrinkle. Say you send off a request for some big, fat mp3 file that won’t all fit in one envelope. So it gets broken up into a whole bunch of separate letters. Now on top of that, like the real post office, stuff can get lost: Mail trucks get stolen; Some yutz in New Jersey cuts through a long-distance fiber-optic cable with a backhoe. Even if nothing gets lost, there’s no guarantee that everything is going to show up in the order you sent it.

So what do you do? First, you send a letter that effectively says, “Hey, I want to send a whole bunch of letters back and forth with you. Here’s the address you should send all the replies to.” This is where your web browser says, “Connecting to …” in the status bar. Once you’ve got an OK back, you say, “Send me that mp3 file.” You get a whole flood of letters back, numbered “1 of 23″, “2 of 23″ and so on. You count through them and realize that you’re missing number 17, so you send another message saying to re-send it. When you finally have all the letters, you can put them in the right order, open them up, and glom the mp3 file together.

This little procedure we’re going through here is called a Protocol. It’s not part of the postal system itself, but it’s a set of rules that people have agreed on for how to use the postal system. In this case, it’s a way of communicating reliably through a system that isn’t reliable. It’s a layer of communications on top of a layer of communications. It’s very meta.

The internet is built up of layers of these protocols. Like the envelopes inside envelopes, at each stage, you’re only concerned with the outermost layer. You slip on or peel off your envelope, and everything else is just the stuff inside it, be it one layer or many. IP, the Internet Protocol, is the postal system - simple, but not entirely reliable. TCP, the Transmission Control Protocol, provides the reliable, ordered delivery on top of IP. Email (SMTP), the web (HTTP), and others are actually another layer on top of TCP. Essentially, they all define standard forms for different mail-order requests.

Again, it’s even more complicated than that. But for now, lots and lots of little letters zipping back and forth across the world. That’s the way to think of it.

Programming Mindset

Sunday, April 17th, 2005

So, a friend of mine wants to learn Perl. I figure I can teach her that - Perl is simple. It couldn’t take more than a half-hour to get her up to speed on the basics.

Remember that Picture Hanging essay, about how you aren’t aware of how much you take for granted once you’ve learned something? Yeah. I had no idea how much stuff you have to learn before you can even start programming. You have to understand the environment - what’s going on in a computer - at a much deeper level than most users need. (As the tourist/business traveler/expatriate metaphor rears its head again.)

First, there’s the whole command-line thing. Fortunately, she had that down, ‘cuz I have no idea where I’d even start. Your whole frame of reference has to shift to deal with that. You have to explain about opening and reading from files - that a file is really a sequence of characters (don’t get started on bytes vs. characters). And by the way, the end of a line is actually marked by a character, but you can’t see it. Oh, and pipes: They’re like files, but they don’t really live anywhere. Ummm… yeah.

Then there’s the whole utility program thing: The idea that programs can be little tiny things, that they don’t have to have graphic interfaces - they “talk” to each other.

To most people, an application is something you see and interact with. Even when you do deal with things like the filesystem, it’s mediated by a graphic interface - a metaphor. It’s a good enough metaphor that people aren’t really aware that it is a metaphor, let alone what it’s a metaphor for. To someone whose idea of programs is Word and Excel, the idea of ‘piping’ the output from one program into another just doesn’t make any sense.

OK, so that’s all the environment. Those are the building blocks you have to work with, the walls you live within. Like I said, I got off fairly easy on most of that; I had someone who knew her way around the command line, and had even used grep a few times. The next step was explaining what a program is.

A program is a sequence of instructions, kinda like a recipe. But that doesn’t really capture the issue, because recipes are written for humans. Humans are smart. Computers are very, very stupid. Fast, but stupid.

A computer is like some sort of high-speed, idiot-savant imp. It works very quickly, and it can remember a huge amount of stuff, but it’s fundamentally dumb as a bucket of mud. It can’t think for itself at all. If you want it to do something, you have to explain in minute detail precisely how to do it. You also have to think of all the things that could go wrong and how to deal with them.

Let’s imagine you’ve got this sort of imp, and you want it to go get your groceries. It has to go to the store, find all the stuff on the list, pay for it, and come home. Simple, right? You’re underestimating how stupid your computer imp is. If it knows how to walk, open doors, follow directions, and cross streets without getting run over, it’s only because someone else programmed that much for you.

Let’s assume you’ve got an imp programming language that gives you that much. So you say, “Go to the grocery store, get everything on the list, pay for it, and come home.” First, it goes off and never comes back. You go to the grocery store to see what’s up, and your imp is standing in the freezer section. You told it to get pistachio ice cream. They’re out. The imp is waiting for them to restock.

So now you have to instruct it to try another store. You have to give the imp a list of stores, in addition to the list of groceries. If it gets to the last store without finding everything, it should just come home.

This seems too work, but it always takes your imp a really long time to get groceries. You spend a while following him around (debugging) and discover that he’s going to every store every time, even if he already has everything. Ooops, another fix to make.

You write a bunch more imp code, and send it off. It still never comes home. This time, it’s stuck at the cash register. Spaghetti sauce went up 5 cents from last time, and now your imp doesn’t have enough money, and is stuck in the act of paying. You can just give the imp some extra money, but you’ll always have to deal with the possibility that he won’t have enough. Even though it seems less than ideal, you tell him to just come home if he doesn’t have enough money for everything.

Everything works fine for a while, but eventually, you realize that you haven’t had vanilla ice cream for ages. It’s on the list. You check the stores, and sure enough, they’ve got tons of it. You double-check the list, and it’s on there as “Vannilla”. Okay, maybe you could come up with a system for correcting your typos - one that won’t result in your imp buying a gun when the store is out of gum - but that’s a lot of work. It’s easier for you to just type things right. But if you ever sell (or even give) your imp code to other people, you know there will be an unending stream of complaints about how your stupid imp won’t buy vanilla ice cream.

Programming is all about developing this mindset - learning to think through all of this stuff in advance, figuring out all the ways your imp could screw up anything you tell it to do.