This is the first in a series of technical posts on using parent-child SSIS architecture. In this post, I will provide an overview of the architecture and describe the benefits of implementing a parent-child design pattern in SQL Server Integration Services structures.
Parent-Child SSIS Architecture
The simplest definition of SSIS parent-child architecture is that it consists of packages executing other packages. In SSIS, the package is the base executable; it is the most granular component that can be executed independently1. Every version of SSIS includes the ability for one package to execute another package through the use of one of the following:
- The Execute Package Task
- T-SQL commands
- The Execute Process Task (using the dtexec.exe utility)
Though the reference to parent-child architecture implies that there are exactly two layers to this design, that does not have to be the case. You may have a design where a package executes a package which executes a package, and so forth. Although there may be a hard limit to how deeply nested a parent-child architecture may go, I have never encountered such a limitation. I have found it useful on a few occasions to go deeper than two levels in this type of architecture, particularly when designing a formal ETL framework (to be discussed further in a future post in this series). In cases where greater than two levels exist, finding the right terminology for those layers is important. You can refer to them by patriarchy (grandparent/parent/child) or by cardinality (level 1, level 2, level n), as long as you remain consistent – especially in your documentation – with those references.
Conceptually, a parent-child architecture is a form of code abstraction. By encapsulating ETL actions into discrete units of work (packages), we’re creating a network of moving parts that can be developed, tested, and executed independently or as part of a larger collection.
Benefits
As I mentioned in my introductory post, there are several benefits to using parent-child SSIS architecture.
Reusability. In any ETL environment of significant size or complexity, it’s quite normal to discover common ETL behaviors that are reusable across different implementations. For a concrete example of this: In my spare time, I’m working on an ETL application that downloads XML files from a Major League Baseball web service. There are files of various formats, and each file format is processed a different way, but with respect to the download of the files, I always perform the same set of operations: create a log entry for the file; attempt to download the file to the local server; log the result (success or failure) of the download operation; if the download has failed, set the HasErrors variable on the main package. If I were to load this behavior into a group of tasks in the package for each XML format, I’d have five different copies of the same logic. However, by building a parameterized child package that performs all of these core functions, I only have to build the file download/logging logic once, and execute the resulting package with the appropriate parameters each time I need to download a file.
Easier development. Working with large and complex SSIS packages can be a pain. The larger the SSIS packages, the longer it takes for the BIDS or SSDT environment to do its validation checks when the package is opened or modified. Further, when multiple ETL developers are working on the same project, it is much easier to break apart the project into discrete units of work when using numerous smaller SSIS packages.
Easier testing and debugging. When working through the test and debug cycles during and after initial development, it’s almost always easier to test and debug smaller packages. To test a single task that resides in a large SSIS package would require either running the task by itself manually in the Visual Studio designer, or disabling all of the other tasks and redeploying the package. When working with packages that each perform one unit of work, one can often simply execute the package to be tested through the normal scheduling/execution mechanism.
Clarity of purpose. An architecture that uses small, single-operation packages lends itself to clarity of purpose by virtue of naming. When browsing a list of deployed packages, it is much more clear to see package names such as “Load Customers Table”, “Merge Product Table”, and “Remove Duplicates in Vehicle Table” than to find do-everything packages with names like “Load Production DB”, “Update DW”, etc.
Performance. In some cases, breaking out multi-step SSIS package can bring some performance gains. One distinct case that comes to mind is using a distributed architecture, where packages within a single execution group are executed on multiple servers. By distributing packages across different SQL Server machines (either physical or virtual), it may be possible to improve performance in cases where the processing load on a single SSIS server has become a bottleneck. I want to emphasize that using a parent-child architecture does not arbitrarily improve performance, so this should not be used as a silver bullet to improve a poorly performing group of packages.
The Tools
As I mentioned earlier, there are three tools that can be used to execute a package from within another package.
The execute package task. This is the easiest and most common means of executing a package from within another. This task can trigger the execution of a package stored on the file system, deployed to MSDB or the SSIS catalog, or residing in the same project. If using SSIS 2012 with catalog deployment mode, you can also use the execute package task to pass parameter values from the parent package to the child package. It is important to note that the execute package task behaves differently in SSIS 2012 than it does in older versions.
T-SQL commands (via the execute SQL task). For SSIS projects using project deployment model in SSIS 2012, the built-in stored procedures in the SSIS catalog can be used to execute packages. This method for executing packages, like the execute package task, allows you to specify runtime parameters via T-SQL code. One significant advantage of using T-SQL commands to execute packages is that, unlike the execute package task, you can use expressions to set at runtime the name of the package to be executed. This is useful in cases where you are iterating over a list of packages that may not be known at runtime, such as a pattern found in ETL frameworks.
dtexec.exe (via the execute process task). Using this method allows you to trigger package execution via the command-line application dtexec.exe. Although this method is typically used to execute packages in a standalone environment – for example, when using third-party scheduling tools to orchestrate package execution – but dtexec can also be used within SSIS by way of the execute process task. As an aside, I rarely use dtexec to execute child packages – in most cases, it’s easier to use either the execute package task or T-SQL commands to execute one package from within another.
I’ll also briefly mention dtexecui.exe. This is a graphical tool that serves the same purpose as dtexec.exe, except that the former exposes functionality via a graphical user interface rather than forcing the user to use command-line parameters for configuration. Except for this brief mention, I’ll not cover dtexecui.exe in this discussion of parent-child architecture, as that tool is intended for interactive (manual) execution of packages and is not a suitable tool for executing one package from within another.
Parent-Child SSIS Architecture in the real world
To illustrate how this can work, let’s model out a realistic example. Imagine that we are tasked with developing a sizable healthcare database. In addition to our production data, we’ve got multiple environments – test, development, and training – to support the development life cycle and education needs. As is typical for these types of environments, these databases need to be refreshed from the production database from time to time.
The refresh processes for each of these environments will look similar to the others. In each of them, we will extract any necessary data for that environment, retrieve and restore the backup from production, and import the previously extracted data back into that environment. Since we are dealing with sensitive healthcare data, the information in the training database needs to be sufficiently anonymized to avoid an inappropriate disclosure of data. In addition, our test database needs to be loaded with some test cases to facilitate testing for potential vulnerabilities. Even though there are some differences in the way each environment is refreshed, there are several points of shared – and duplicate – behavior, as shown below (with the duplicates in blue).
Instead of using duplicate static elements, we can eliminate some code redundancy and maintenance overhead by encapsulating those shared behavior into their own container – specifically, a parameterized package. In doing so, we can avoid having multiple points of administration when (not if) we need to make adjustments to those common elements of the refresh process. The updated architecture uses parameters (or package configurations, if using package deployment mode in SSIS 2012 or any older version of SSIS) to pass in the name of the database environment to refresh.
As shown, we’ve moved those shared behaviors into a separate package (RefreshDB), the behavior of which is driven by the parameters passed into it. The duplicate code is gone. We now have just one SSIS package, instead of three, that need to be altered when those common behaviors change. Further, we can individually test and debug the child package containing those common behaviors, without the additional environment-specific operations.
Note that we haven’t reduced the number of packages using this architecture. The goal isn’t fewer packages. We’re aiming for a modularized, easy-to-maintain design, which typically results in a larger number of packages that each perform just a few (and sometimes just one) functions. In fact, in the parent-child architecture shown above, we could even further refine this pattern by breaking out the individual operations in the RefreshDB package into packages of their own, which would be practical for cases in which those tasks might be executed apart from the others.
Exceptions to the rule
Are there cases in which parent-child structures do not add value? Certainly. A prime example of such a case is a small, simple package developed for a single execution with no expectation that its logic will be reused. I call these throwaway packages. Because of their single-use nature, there is likely little value in going through the effort to building a parent-child SSIS architecture around their business logic.
Up Next
In my next post in this series, I’ll work through the mechanics of using a parent-child pattern for packages stored in package deployment mode (which was the default setting in SSIS 2005-2008).
1 Technically, there are lower-level items in the SSIS infrastructure that can be executed independently. For example, from the BIDS or SSDT design surface, one can manually execute a single task or container within a package. However, when deploying or scheduling the execution of some ETL behavior, the package is the lowest level of granularity that can be addressed.
Hi Tim,
I like the refresh environment example.. Nice.
Question: When have you found that a parent-child is best to use, and when maybe not an avenue to explore? You can gain a sort of abstraction in the method that the package is called as well e.g. Let’s say I have an sftp package that may be called either before the main package in step 1 (for import), or afterwards (export) of a SQL Agent job.. If I were doing an import and the files were already downloaded to the server I would simply call step 2 of the job, and keep separation without the parent-child relationship.
Thanks,
Lee
Lee, I’ve found that when creating logic that may be reused, it’s a no-brainer to wrap it in a child package that can be accessed by multiple parent packages. Same thing if there are multiple developers working on an initiative – farming out tasks into individual packages is much easier than everyone working on a single package.
If neither of these apply, it’s a judgement call. There’s some overhead (sometimes a little, sometimes a lot) in breaking work into multiple packages. If breaking those tasks in to multiple package can ease the pain of development, troubleshooting, or deployment, I usually break it up. As a rule, when in question I break up those tasks into multiple packages.
Hi Tim
I’ve read your first post with interest as I have implemented a framework using a Parent-Child pattern I found in a book. I’m keen to see your suggestions for a framework – the one I have works quite well with standardised error reporting and logging but I haven’t found an easy way to run packages in parallel, yet.
Regards,
Nick
Hi Nick,
I’ll be touching on frameworks later in this series. In the meantime, my friend Andy Leonard has written quite a bit about SSIS frameworks, including publishing a free downloadable SSIS framework. You can find his writing here: http://sqlblog.com/blogs/andy_leonard/archive/tags/SSIS+Frameworks/default.aspx
you give us best informative blog about Parent-Child SSIS Architecture
Thanks Douglas! I look forward to continuing this series.
Hi Tim,
Just wanted to share some thoughts about an interesting use case of SSIS Architecture using Parent-Child I designed a couple years ago for a client and still use to this day.
Applying the parent-child architecture to manage all of the “package” workflow logic is very useful but sometimes it can get a bit messy. For instance, when we create a couple of new packages, usually we end up having to alter all the parents where they are executed and update both packages to server. In bigger solutions and bigger teams this can get very error prone. The approach we took at the time was to try to separate all the packages that actually “do stuff” from those responsible for controlling all of their workflow logic.
We ended up developing a metadata-driven solution where we have a core package table and a single package responsible for controlling all the package execution flow. The core package table is the place where we specify all package names, parents, server path, and all the detail associated with a single package, and even execution flags like “active” (that may be update to zero to avoid execution). This table only contains packages that actually extract, clean and conform data and it was designed to represent a parent-child relationship where we may have packages that execute child packages. This “parent packages” only exist has an entry in this table. To actually make this work we developed a very simple package that reads ahead all the information in this table and it is responsible for the execution of all the “active” packages in the table. Additionally we ended up adding some checkpoints to control execution after failure and other similar features.
Would love to hear some thoughts/comments about this type of approaches and suggestions of other ways to implement this type of logic.
Regards,
Tiago
I am in a big confusion now.
I am facing a strange issue in decision. I have to develope some packages. I have a parent table and related I have some child
tables. I have to develop packages in a way that my parent table runs and then my child tables.
I have gone through many suggestions from tech forums of microsoft and other sites which say to use a master package and
call all child packages after parent. So i created a parent package and child packages and calling it from master package.
Inside every package there is 1 data flow task and some execute tasks. Every data flow task has some transformtions also.
I went ahead with this parent-child architecture after reading many blogs from different sources.
Now there is a discusion of whether this approach is correct. I got one suggestion that to make 1 seperate package for parent table and
a seperate package for all child tables but I am not fully convinced with that.
I am not sure which approach to use since keeping every thing in 1 package is quite a mess. if i have different package for each table(number of tables right now is
8 which may or may not increase) then I was able to manage it properly.
I have used this approach in many assignments and was very handy and was modular but I don’t know if I should use here.
Tim, any suggestion would be very helpful to me.
Scott, I’m a big fan of having one package per operation. Loading 8 tables? 8 packages. There are exceptions to this, of course, but that’s the general rule I go with. Hope this helps.
Thanks Tim.
Yes, I am loading 8 tables from source to Destination tables. But having said that how to prove it that we are on right track? hope you get what I mean 🙂
I missed 1 point. All the child packages will run in parallel.
The single-operation package pattern supports both serial and parallel operations.
Correct me if I am wrong but is this approach (master/wrapper package running all child packages parallely) ok if the tables have less data? I though this approach is good with large data tables.
Yes, it works fine in either scenario. Building smaller packages has little to do with data volume. It’s intended to make development, testing, and deployment easier.
Great article. I would add that another advantage of the Parent-Child approach is reduced risk surface. I.e. if a solution needs changing or correcting the relevant package(s) can be replaced without affecting the rest of the solution.
Absolutely – code reuse and modularity is one of the best features of this design pattern.
Tim, this is great article. I do use child packages as common components / utility / as service and call multiple instances of same package passing different parameters. It does provide great re usability and modularity to the code. With project deployment model, now I have to have those utility packages within each project locally. If any of them needs change, i will have to redeploy to each of them. The only way they allow execute package is using script component. How are you solving this issue? Ideally I want to keep all packages acting as common components / services within one project and rest of the projects should be able to consume it.
Hi Shrikant, I wish it were easier for one SSIS package to execute another package already in the SSIS catalog, but for now, it takes a bit of manual setup.
The way I usually handle this is to create an Execute SQL Task with the T-SQL to execute the catalog-deployed package, and use parameters in the task that match up with parameters in the package to be executed. It takes a bit more work, but it can be done relatively easily.
Have you had issues with a level 2 package calling level 3? I am wondering why someone would call a stored proc in level 2 to call the package using the catalog.start_execution procs?
Alex, I’ve used 3-level executions before without any trouble. When using catalog.start_execution instead of the execute package task, the child package shows up in the log as a standalone execution (as opposed to logging beneath the parent package execution).
Tim, nice set of articles, thanks for putting them together!
This might be a bit of topic as your articles are focused on SSIS, but I am curious to know your thoughts on the comparison between using the parent/child approach (which I agree with using in general) in an SSIS approach versus a purely T-SQL based approach. What you have laid out here is a great description of why a “parent/child” approach works, but really this type of approach could be accomplished with purely SSIS and SQL Agent OR purely with T-SQL and SQL Agent. I believe this might mostly come down to the tastes/experience of the developer creating the solution, but there are some things that do come to mind when comparing the two methods:
“code comparison” – within a source control system, comparing versions of sql stored procedures/views versus comparing xml-based SSIS packages… comparing the text of sp’s and views shows us results that are exactly what we see when we work with the sp or view. The comparison of an xml file shows us differences in xml, this can be harder to visualize what the differences mean in the gui we actually develop the package in (maybe there is a better way to check differences/changes in an SSIS package that I have not come across?)
“access to live, source code” – scripting out a stored procedure/view to have a live/direct view of production code versus an SSIS package tucked away in the SSIS catalog. You can get to the XML of the package, but this again lacks the gui-interface we typically use when working with a package.
… things of this nature. Have you built child/parent component systems using purely t-sql and sql agent? Any thoughts on pros/cons of each?
Jamie, thanks for the comment. While it would be possible to get some of the same behaviors when working with a pure T-SQL orchestration, the SSIS control flow offers a lot more options: parallel execution, complex conditional behaviors, and event handling beyond just errors.