ETL Antipattern: Start With Writing Code

In this first post in my series on ETL Antipatterns, I’m going to discuss one of the most common missteps when building an extract-transform-load (ETL) process: jumping straight into writing code as a first step. ETL Antipattern: start with writing code Most data architects and developers are intensely curious folks. When we see a set of data, we want to…


The Eleven Days of Festivus 2020

We’re rounding the corner to the second half of December, which means it’s time for my favorite holiday: Festivus! Like many of you, I enjoy gathering around the Festivus pole and sharing the time-honored traditions such as the Feats Of Strength and the Airing Of Grievances. But my favorite Festivus tradition takes place right here on this blog: the Eleven…


Creating Your First Azure Data Factory

Azure Data Factory has grown in both popularity and utility in the past several years. It has evolved beyond its significant limitations in its initial version, and is quickly rising as a strong enterprise-capable ETL tool. In my last post on this topic, I shared my comparison between SQL Server Integration Services and ADF. In this post, I’ll walk through…


My New Favorite Demo Dataset: Dunder Mifflin Data

Those of us who write technical articles and deliver technical presentations are always on the lookout for the perfect data set for demonstration and testing. Microsoft has done a good job of putting together sample databases including Wide World Importers, AdventureWorks, and Northwind Traders. Personally, I’ve found each of these useful, but I had no particular attachment to any of…


Reusing a Recordset in an SSIS Object Variable

A few years back, I wrote a blog post about using an SSIS object variable as a data flow source. In that post, I described how you could load a set of query results into an object-typed variable in SQL Server Integration Services and then use that in-memory data as a source within a data flow. In the comments and…


The What, Why, When, and How of Incremental Loads

When moving data in an extraction, transformation, and loading (ETL) process, the most efficient design pattern is to touch only the data you must, copying just the data that was newly added or modified since the last load was run. This pattern of incremental loads usually presents the least amount of risk, takes less time to run, and preserves the…


Comparing SSIS and Azure Data Factory

For the better part of 15 years, SQL Server Integration Services (SSIS) has been the go-to enterprise extract-transform-load (ETL) tool for shops running on Microsoft SQL Server. More recently, Microsoft added Azure Data Factory (ADF) to its stable of enterprise ETL tools. In this post, I’ll be comparing SSIS and Azure Data Factory to share how they are alike and…


The SSIS Catalog: Install, Manage, Secure, and Monitor your Enterprise ETL Infrastructure

I’m happy to announce the publication of my latest book. The SSIS Catalog: Install, Manage, Secure, and Monitor your Enterprise ETL Infrastructure is now available on Amazon in both paperback and Kindle formats. In this book, I introduce the reader to the SSIS catalog and describe how it fits into an enterprise ETL architecture. This book is, by design, narrowly…


SSIS Parameters

SSIS parameters help to ease the process of passing runtime values to SSIS packages. For values that can change over time, using parameters in SSIS is an ideal means of externalizing those runtime values. In this post, I’ll demonstrate how to get started using SSIS parameters. What Problem Are We Solving? To understand the need for ETL parameters, let’s first…


Secure Your Data Prep Area

I’ve been building ETL processes for many years, and I’ve learned two universal truths: data preparation is messy, and you must always secure your data prep area. A data prep area is very similar to a commercial kitchen, and in the same way that customers aren’t allowed in the kitchen, so should data consumers be kept from accessing in-process data…