ETL is So Much Easier Now

When I think about the rich landscape of ETL (Extract, Transform & Load) tools that organizations can choose from in 2023 to reduce the time to access and manipulate data, I feel like a fossil. Back in the 90s, we didn’t have it that easy. We had to walk to and from school every day, uphill both ways, in five feet of snow… even in the summer. Our ETLs were bash scripts and DSNs, binary files and makefiles, and lots of lots of subdirectories. We spent a ton of time staring at schemas and ERDs and trying to get tables third-order-normalized because there wasn’t enough storage space to be anything other than minimalist. Want to connect to a new data source? Better set aside at least four weeks so you can figure out how to pull data from it.

If time traveler Ethan Aaron, CEO of Portable, had magically appeared in my office and said “In 25 years, there will be libraries of connectors to any data source you can imagine, and if a source you need isn’t in the library we’ll code it for you for free“ I would have immediately started worrying about job security and what this portended for a future that was clearly more shocking than I could imagine. (Don’t even mention the whole magical cloud thing.) If he showed me the Top 100 ETL tools list, I’d think I was hallucinating. I also would have started thinking about Kermit.

(Not the frog… but a really, really yucky kermit-related ETL problem I had in 1997.)

A coworker and I had to analyze some data, but one of the inputs we needed was on a hard drive we had retrieved from some instrumentation out “in the field.” (It was literally in a locked box in a field, next to the instrument.) We found some cables that fit into some ports, and were able to peek into the elaborate directory structure that the instrument had created to store the data. The files were right there… we could see them. But we couldn’t tell what format they were in, or how to open them.

For weeks, we tried everything. Nothing worked. Those files with the super valuable data were just impenetrable.

But then, we decided to ask around the office. Someone had to have collected this same kind of data at some point in time, or maybe had used a similar kind of data collection instrument. We poked our head into people’s offices, but no one had any clues. A week later, a woman came to see us. “I can’t help you with your data,” she said, “but you might want to try Dave’s office, down the hall.” Dave had retired a few years ago, but there were still a couple of Dave’s cabinets in his old room that had been serving as a storage room.

At the bottom of one of Retired Dave’s old cabinets, we found a dusty box with some diagrams of a data collection instrument a lot like the one we were trying to use and we got excited. There was also a book called “Kermit” and a few pages of Unix commands that looked like they might be related to the book. I’ll spare you the gory details (mainly because I can’t remember all of them, this many years in the future…) but long story short, we were able to use the kermit file transfer and management protocol to crack open our data… and do the analysis we’d needed.

Time-to-value in 1997 to get our data: About four elapsed months.

Time-to-value in 2023 with ETL tools like Portable: Days, I bet. I can’t express what a joy it is to not have to build frameworks from scratch, any more. And I’m thankful that companies like those on the Top ETL Tools list are making this joy possible… even if early career programmers can’t feel the gratitude quite as intensely 🙂