When working in the SSIS data flow, you’ll notice that many sources and transformations and some destinations have a built-in output to handle errors. The error output allows the SSIS developer to create a separate path through which error rows can be directed.
In this SSIS Basics post, we’ll briefly discuss the essentials and design patterns for using SSIS error outputs on data sources and transformations.
SSIS Error Output Overview
SSIS error outputs are a secondary path through which the data flow can send rows that do not conform to data type, length, or transformation standards defined by the ETL developer. That’s a lengthy way to say that it’s where you can send your junk data. In the SSIS designer, clicking on a source or transformation will often show not one but two possible outputs: the primary output (the “good” data, indicated by the blue line) and the error output (identified by the red line). As shown on the flat file source below, when selecting a source or transformation, those that have an available error output will appear with both output connectors ready for selection.
Using this design will allow you to handle errors such as:
- Incorrectly typed data
- String data longer than its specification
- Invalid transformation logic
- Relational constraint errors at the destination
SSIS error outputs provide a great way to handle two things at once: providing a secondary path through which bad data can be either sent to a triage table or cleaned up, and preventing the package from failing based on just a few bad rows of data.
When connecting an SSIS error output to a downstream component, you’ll be prompted with a configuration window in which you can select the error output options. The most important thing to remember here is that default behavior is to fail the component upon any error, even when you connect the source or transformation’s error output. The default options are shown below.
As mentioned, you must explicitly set the behavior to Redirect row for the column(s) that will be handled by the error output. You can do this on a column-by-column basis, if you want to handle error conditions on that level, but in most cases I see that all columns are set to either fail or redirect as a group. In the screenshot below, I’ve set all columns to redirect upon error or truncation.
Practical use
So what happens when we connect a component’s error output? In most cases, any rows in error would be redirected to that output, which we can use as a secondary path for cleanup or triage. Assuming we change the error and truncation behavior to Redirect row, errors that are captured by the error output would not result in a failure of that source.
There are several design patterns surrounding the use of error outputs on sources or transformations. Among the most common:
- Cleanse inline. If possible, the data can be sanitized inline using other SSIS components. If the ETL can be built to handle most of the known data deficiencies, this is usually the cleanest way to handle error rows. Very often, data quality tools such as SQL Server Data Quality Services can be useful for inline cleansing.
- Send to triage. If the package does not have cleansing logic, or there are rows that still cannot be cleansed using that logic, the bad data can be sent to triage for manual review or cleansing. Though this pattern does require manual intervention, it also allows us to audit and review the reasons for those failures by preserving the data.
- Ignore. It is possible to redirect error rows into the bit bucket, resulting in a loss of data. In rare cases, this is acceptable; however, in most cases, there should be some measure of data capture to improve the quality of the ETL process.
Which is the best pattern to use? As always, it depends. Factors such as the criticality of the data, the types of errors expected, the frequency of the ETL process, and many others must be used to decide how, if at all, to use the error paths in SSIS.
Caveats
As with any ETL function, there are a few caveats and cautions around using error outputs. First of all, I don’t recommend adding an error output path to every component that supports it simply because it’s available. Redirecting error rows is not always appropriate – sometimes it makes sense to fail the package rather than trying to programmatically clean up or triage error data.
Additionally, you’ll need to familiarize yourself with the different types of error outputs generated by each component. For example, if you use the error output of the flat file source, it will not break apart the data into individual columns (even for those elements not in error for that row) – because the error output is a relatively low-level function, what you get is a single string with all of the data in it, and if you have processes that consume this data downstream of the error output, you’ll need to do some transformation on the suspect rows before you can do any cleanup.
Finally, it is important to note that not all components support error outputs. Because of the differences in the types of operations performed by each source, destination, or transformation component, some of them do not include an error output path.
Conclusion
In this post, we’ve briefly covered the SSIS error output settings, along with a few design patterns for implementing these in a package. Using SSIS error outputs is a critical part of an overall error management strategy in any robust ETL infrastructure.
Using a data flow task with a ole db source and destination, setting truncation to fail component, I still do not get an error when running the data flow task and the data get truncated with out me knowing. On the source in the advance editor if I then set the column in the output columns to the size of the column of the destination table I get an error and the task errors. Why does the output column not automatically recognize the size of the destination table. Any thoughts greatly appreciated.
Lesley, if you’re looking at the OLE DB source, it makes sense that it’s not throwing an error if the length of the value exceeds the length of the column in the final destination. The OLE DB source is aware of the metadata for the source from which the data was retrieved, but doesn’t validate directly against the destination – that’s the job of the OLE DB destination (or whichever destination you’re using).