Temp tables are very handy when you have the need to store and manipulate an interim result set during ETL or other data processing operations. However, if you use SQL Server Integration Services as your ETL tool, you may find some challenges when trying to work with temp tables in SSIS packages, especially in the SSIS data flow.
In this post, I will discuss why using temp tables in an SSIS load is usually a bad idea. I’ll also share a few alternative design patterns to help avoid using temp tables in SSIS packages.
Let’s talk temp tables
Temporary tables (or simply, temp tables) are structures available in most every relational database system that allow for temporary storage of a result set as a physically instantiated table. Temp tables in SQL Server are created in the tempdb system database. Temp tables in SQL Server are typically scoped to a single user session, or may be created with global scope to allow interaction from more than one connection.
Temp tables are great for interim data processing. If you need to retrieve a subset of data and manipulate or reshape it before sending it elsewhere, temp tables provide an easy-to-use way to do that. Since temp tables are actual physical structures, you can do most anything to them that you can do to a permanent table, including adding indexes and constraints for better performance and data consistency.
These capabilities make temp tables an attractive choice for ETL operations. Because a good bit of work in ETL requires interim transformation of data, using temp tables is often a go-to pattern for performing ETL.
Using temp tables in SSIS
While temp tables are a good option for in-flight data transformation, there are some unique challenges that arise when using temp tables in SSIS.
SQL Server Integration Services uses tight metadata binding for data flow operations. This means that when you connect to a relational database, flat file, or other structure in an SSIS data flow, the SSIS design-time and runtime tools will check those data connections to validate that they exist and that the metadata has not changed. This tight binding is by design, to avoid potential runtime issues arising from unexpected changes to the source or destination metadata.
Because of this metadata validation process, temp tables present a challenge to the SSIS data flow. Since temp tables exist only for the duration of the session(s) using them, it is likely that one of these tables created in a previous step in an SSIS package may not be present when validation needs to occur. During the design of the package (or even worse, when you execute the deployed package in a scheduled process), you could find yourself staring at an “object not found” error message.
With respect to using temp tables in SSIS, the advice I give is usually this: Avoid temp tables in SSIS, especially in SSIS data flows. Although there are some cases where they work well, the risk of metadata validation failure usually outweighs the benefits of using a temp table.
So what do we do?
Fortunately, there are alternatives. Because interim data processing is a powerful design pattern in any ETL tool, there are several different ways you can store and process temporary result sets in SSIS packages without resorting to using temp tables. Below are a few alternatives that reduce the risk of unexpected metadata errors.
Use a CTE (common table expression). CTEs were introduced in SQL Server 2008 as a means to define a query that can be used as a temporary result set for another query. The CTE is not physically written to disk, but is processed in memory during the execution of the query. The CTE only exists for the life of a single query, so it is best used where you will not need to reuse the same temporary result set more than once. When I need to quick operations such as creating an employee hierarchy, attaching row numbers to a result set, or creating other calculated columns, the CTE is my go-to tool. Keep in mind that you can’t index a CTE or persist it from one query to the next, so this option is best used for reasonably small sets of data that will only be used once.
Use permanent staging tables. If you really do need to write an interim result set to disk, using a permanent staging table is a great option. This works similarly to using temp tables, except that the tables are not dropped automatically like temp tables; these permanent tables are created once in a staging area and reused again and again by truncating and reloading with each operation. Staging tables allow you to create indexes, build statics, and persist the result sets beyond the scope of a single query if required by your load process. Be careful, though, if you may run multiple simultaneous processes that use the same staging tables, since these permanent tables may be usable (and therefore could be modified or truncated) by any account with access to them.
Use SSIS in-memory processing. Remember that SSIS isn’t just an endpoint to connect to sources and destinations. In the control flow, and much more in the data flow, you can find a rich collection of tools for data cleansing and deduplication. On the control flow, the execute SQL task allows you to return a result set to an SSIS variable, which can then be used as an iterator for a for each loop or a source for a script component. On the data flow, you can use the built-in transformations to sort, deduplicate, pivot, or create derived values. And if you need the ultimate amount of flexibility, you can use the script component as a transformation to read an ADO recordset object variable to do just about anything one could imagine to the data, entirely in SSIS memory.
Use delayed validation. If you absolutely must use temp tables in your SSIS package, turn on the Delay Validation setting on the data flow tasks that use temp tables. Doing so will force the validation process to later in the execution cycle, which may provide enough lag to allow those tempdb objects to be created before the required metadata validation takes place. This still isn’t a silver bullet for solving temp table-related issues, but it will help to reduce the possibility of a validation failure.
Conclusion
Temp tables and SSIS often don’t play well together. Because SSIS requires up-front metadata validation, temporary objects created during the SSIS package execution may not be recognized, thus throwing validation errors. My advice is to use one of the other methods described above – CTEs, permanent staging tables, or SSIS in-memory processing. If you still decide that temp tables are the way to go, at least turn on Delay Validation to reduce the risk of failure.
This problem has happened to me in the past, with the added restriction that the data was coming from a stored procedure that couldn’t be altered to remove the temp tables. The work around I used was to insert the data output from the stored procedure into a table variable right in the OLE Data Source, using this kind of syntax:
Declare @tempdatatable table (datacolumn varchar(255), rn int)
SET nocount on
Insert into @tempdatatable (datacolumn, rn )
EXEC [dbo].[SomeStoredProc]
Select datacolumn, rn from @tempdatatable order by rn
I get around the validation issue by setting “DelayValidation” to false and create permanent tables only to drop soon there after. My loop container is then
1. SQL Task – create permanent table
2. Dataflow – load data into permanent table
3. SQL Task to transform data in permanent table
4. Dataflow to move data out of permanent table
5. SQL task to drop permanent table
6. End of loop
When creating the dataflow task the permanent table will need to be present, I just disable the drop table step. Wham-o, clean database once all is done.
“The CTE is not physically written to disk, but is processed in memory during the execution of the query” – CTE is like subquery, server execute union query (query+cte) and build one plan (not always processed in memory, server may spill data to tempdb), also CTE has many restriction. If you need to use temporary storage,CTE is not alternative. CTE only simplify a query for read.
Yes, under the hood a CTE is very different from a temp table. However, if you are doing fairly simple tasks such as creating row numbers or building small hierarchies, a CTE is a great option and doesn’t suffer from the same runtime risk as using temp table sources.
In SQL Server 2012 and above you can use EXECUTE WITH RESULT SETS to define the metadata. https://docs.microsoft.com/en-us/sql/t-sql/language-elements/execute-transact-sql?view=sql-server-2017
Hi Chris, that’s a useful addition to the T-SQL syntax, and I’ve used that a few times before. The challenge is that you have to explicitly declare the data structure in your query, adding some extra overhead to the development and maintenance. I’ve found that it’s usually less difficult to use the permanent staging tables.