Batch Runner

This is one of several stories about cool stuff I’ve done. See the Portfolio Intro post for more info.

This is about the alchemy of software. It’s about getting computers to do boring stuff we hate, and how that automation lets us do things differently and better, not just faster. It’s also about a project I’m proud of, where I think I earned merit badges for Playing Well With Others and Thinking Outside the Box.

I was working with a team of engineers who managed a data processing engine. Each of our customers had somewhat different data running through the engine, and the engineers had to come up with an optimal engine configuration for each customer. This involved setting up an initial configuration, running a bunch of data through the engine, looking at the results, tweaking the configuration, re-running the data, and so on until they got satisfactory results. Running the data took anywhere from two to six hours, so patience was a serious limiting factor. They came to me with a request to create some sort of web interface to help them manage this whole process, automating what I could.

The first step in automating this process was to figure out how they did it by hand. I quickly learned that each of the engineers had their own way of going about it. I started out with one of them, documenting his process and sketching out code to tie it together. Later, I needed clarification on one of the steps. He wasn’t around, so I went to one of the other guys, who said, “Oh, that’s not how I do it at all.” So I dragged them all into a conference room for an afternoon to hammer out what the process should be. We had to work out some trade-offs where the ideal manual process would have been difficult to automate, but at the end of it, we had a “best practice” they were pretty happy with. The experience was engaging for them because it gave them a chance to step back from the daily grind and think about their jobs in a more “meta” way. It was a fun and spirited debate, and everyone learned from it.

My first iteration was fairly straightforward. It provided an easy way to set up a new customer configuration and data set, and start a run; it sent email when the run was done; and it had some nice reporting of the results and a GUI for making configuration changes for the second run. That alone was a win. But now that we had it automated, I realized there were some interesting things we could do. We had a dedicated server for this; why not keep it maxed out around the clock? Instead of just doing one run, I set it up so it would try several in parallel, experimenting with different starting configurations. Then I went to the engineers and asked how they decided what configuration tweaks to make for the second run. They’d say, “Well, if this number looks bad, you’d weed out that factor,” and I’d ask, “What’s bad?”

“Probably under 0.5”

“How ‘bout I weed it out automatically if it’s under 0.3?”

“Yeah, that’d work.”

So then I was able to do automated second runs for each experiment. They were often as good as if they’d been configured by hand, and at worst still gave useful information. And they ran while we slept.

A lot of the value to the engineers was that their time was less fragmented. Instead of interrupting their regular work every couple hours to check on the process, they could “fire and forget” it. When all the processing was done, they’d have a solid block of interesting analysis work. If they wanted, they could kick off a whole set of new experiments at once, and wait for the email.

We also got better much results this way. Testing new configurations by hand had been so tedious that the engineers would stop once they got something good enough. Four or five runs was a lot. We were now doing eight for starters, with more variety in the configurations. “Good enough” got a lot better.

Over time, the logic for tweaking the second-run configuration got better and better. The results reporting became more informative and clearer. We eventually got to the point that we could turn most of this process over to one of our tech support staff. The engineers only got called in for interesting problems: the results weren’t good enough, or there was something strange in the analysis.

This is the alchemy of software. We used to have these highly skilled engineers doing this really tedious job. Then they got the interesting job of figuring out best practices, and I got the fun job of developing the software. Now they only have to handle the tricky cases that challenge them, a junior tech gets to do a bit of cool data analysis, and the company gets higher-quality results. Win all around.