ETL Antipattern: Ignoring the “Why?”

In this final post in my ETL Antipatterns series, I’ll talk about a common big-picture mistake when building ETL processes: ignoring the fundamental question, “why are we doing this?” ETL Antipattern: ignoring the “why?” When I first began my consulting career more than ten years ago, most of my work was transactional. I’d be assigned to a project and would…


ETL Antipattern: Ignore the Logging

In my last ETL Antipatterns post, I wrote about the unexciting but very necessary work of documenting ETL processes. The logging of ETL operations is just as (un)captivating as documentation, but is equally as important in the support of data movement and transformation processes. In this post, I’ll discuss a common misstep in ETL process management: ignoring the logs. What…


ETL Antipattern: Skipping The Documentation

Documentation is an asset that is both loathed and loved. Creating technical and business documentation is often looked upon as a tedious chore, something that really ought to be done for every project but is often an easy candidate to push until later (or skip entirely). On the other hand, good documentation – particularly around data movement and ETL processes…


ETL Antipattern: Load Processes that Don’t Scale

One of the most significant design considerations in ETL process development is the volume of data to be processed. Most ETL processes have time constraints that require them to complete their load operations within a given window, and the time required to process data will often dictate the design of the load. One of the more common mistakes I’ve seen…


ETL Antipattern: Failure to Test and Validate

“If it compiles, it works.” – An unemployed developer Building ETL processes is quite easy. Building ETL processes that deliver accurate results as quickly as possible is substantially more difficult. Modern ETL tools (including my personal favorite, SQL Server Integration Services) make it deceptively easy to create simple load process. That’s a good thing, because an easy-to-understand front end shortens…


ETL Antipattern: Failing to Treat ETL Logic as Source Code

In most data projects, building the extract-transform-load (ETL) logic takes a significant amount of time. Enterprise ETL processes must do several things well: retrieve enough data to satisfy the business needs, apply any needed transformations to that data, and load it to the destination(s) without interruption to any other business processes. The work that goes into building and validating that…


ETL Antipattern: No Error Handling Logic

I usually avoid talking about technology in absolutes, but here’s one that I can share without reservation: On a long enough timeline, every single ETL process will eventually fail. Is your ETL design built to handle a failure? I see far too many SSIS packages, ADF data factories, and other data movement applications built with the assumption of 100% success,…


ETL Antipattern: Lazy Metadata

If data is a train, then metadata is the track on which it travels. A good metadata definition in ETL processes will help to ensure that the flow of the data is predictable, robust, and is properly constrained to avoid errors. However, many ETL processes take a hands-off approach when it comes to metadata. In some cases, this laissez-faire design…


ETL Antipattern: Performing Full Loads Instead of Incremental Loads

In my last post in the ETL Antipatterns series, I wrote about the common antipattern of ingesting or loading more data than necessary. This brief post covers one specific case of loading more data than necessary by performing a full data load rather than using a smaller incremental load. ETL Antipattern: performing full loads instead of incremental loads Earlier this…


ETL Antipattern: Processing Too Much Data

In my continuing series on ETL Antipatterns, I’ll discuss the problem of loading too much data in extract-transform-load processes. ETL Antipattern: processing too much data A common design flaw in enterprise ETL processes is that they are processing too much data. Having access to a great breadth and depth of data opens up lots of options for historical reporting and…