Calendar
<<  September 2010  >>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789
Blogroll

    I’ve fielded a number of requests recently asking how to interrogate a file within SSIS and change the processing rules based on the metadata of said file.  A recent forum poster specifically asked about using the foreach loop to iterate through the files in a directory and, on a per-file basis, either process the file or skip the file if it was updated after a specific date.  I’ll use that most recent request to illustrate one method to solve this problem.

    Ingredients

    For this demonstration, our SSIS package will require the following:

    • A foreach loop to process each file in a given directory
    • A script task to interrogate each file and, based on the timestamp, mark it to be either processed or skipped.
    • Four SSIS variables:
      • @SourceDirectory (String) – stores the directory to loop through
      • @MinDateStamp (DateTime) – indicates the earliest date to process
      • @Filename (String) – stores the current filename for each cycle of the foreach loop
      • @ProcessFile (Boolean) – a flag to indicate whether the current file should be processed
    • A precedence constraint which will be configured to validate an expression and task outcome.
    • A data flow task to process the validated files.


    Set Up the Loop

    Nothing groundbreaking here: after adding the foreach loop to the control flow pane, set it to work as a Foreach File Enumerator, and use an expression to set the source directory to be derived from the value of the @SourceDirectory variable:

    img2

     

    Script Task

    Since there is no native SSIS task designed to interrogate file metadata, we’re going to need to use a script task to do this.  After dropping a script task from the toolbox into the foreach loop container, we’ll edit the script to create a FileInfo object as a logical hook to the file.  After confirming that the file exists, we’ll compare its LastWriteTime property to the earliest acceptable cut-off date (defined by the @MinDateStamp variable value) to determine if the timestamp meets the criteria for processing.  Based on the results of that comparison, we will set the @ProcessFile value to either True or False.  You can see the resulting code logic in the snippet below:

     

    public void Main()
            {
                // Create a logical file object 
                System.IO.FileInfo theFile = new System.IO.FileInfo(Dts.Variables["Filename"].Value.ToString());
     
                // If the update date on the file is greater than the date specified in the MinDateStamp
                //  variable, set the variable flag to process the file.    
                if (theFile.Exists 
                    && theFile.LastWriteTime > DateTime.Parse(Dts.Variables["MinDateStamp"].Value.ToString()))
                {
                    // MessageBox.Show("Processing file " + Dts.Variables["Filename"].Value.ToString());
                    Dts.Variables["ProcessFile"].Value = true;
                }
                else
                {
                    // MessageBox.Show("Skipping file " + Dts.Variables["Filename"].Value.ToString());
                    Dts.Variables["ProcessFile"].Value = false;
                }
     
                Dts.TaskResult = (int)ScriptResults.Success;
            }

     

    So with each iteration of the foreach loop, the @ProcessFile value will indicate whether the current file should be processed or skipped.  After adding a data flow task containing the necessary components to process the flat file, our next step would be to add a precedence constraint connecting the script task to the new data flow task.  This precedence constraint will be configured to use an expression and a constraint, and will confirm that the current file is to be processed by interrogating the value of the @ProcessFile variable.  If that value is true, then program flow continues to the data flow task; otherwise the loop starts again with the next file in turn.  The precedence constraint would be configured as such:

    img3

     

    After configuring all of the necessary tasks for this operation, the data flow pane should look similar to the following:

    img1

    Now, when the SSIS package is executed, the timestamp of each file in the specified directory will be checked, and only those that meet the date criteria will be processed in the data flow task.  Note that you could replace the timestamp in our example to some other file criteria; for example, you could check the file size, type, attributes, or other settings to determine if the file should be processed.

     

    Conclusion

    Although SSIS does not include a native component to conditionally process files, you can see from this example that a simple script can easily solve this ETL challenge.  You can download the sample package used in this example here.

    Posted by Tim Mitchell on 23 March 2010, 08:45

    I've been using the scripting tools in SSIS for some time, but I came across something today that I can't quite explain.  I normally don't posts unresolved problems on my blog, but I'm trying out a strategy suggested by my friend Lee Everest by sharing unfinished work in the hopes that my research and troubleshooting can help someone else.

    So here's the scenario: I'm building an ETL framework using SSIS, which is now three levels deep (envision grandparent, child, and grandchild packages), and I’ve found that I need to be able to pass values from ancestor to descendant packages and vice versa.  While the former is easily done using SSIS configurations, the latter requires a little scripting to accomplish.  Additionally, this scenario requires that the leaf-level packages must be able to be occasionally run atomically as well as within the ETL framework, It was during the testing of those piecemeal executions that I discovered the issue at hand.

    Sidebar: If you’re interested in a how-to on passing values from child packages back up to the parent, have a look at this blog post by Steve Fibich.

    The Code

    So here’s the code I’m using: 

    i1

    In a nutshell, this code will attempt to write to an SSIS variable named LeafLevelConfigSetting.  That variable does not exist in the current package, so the variable by that name must be inherited from an ancestor package.  What I intended to happen is that, if the package is executed by itself, the LeafLevelConfigSetting variable will not exist in the current scope so the LockOneForWrite() method will throw an exception that should be caught by the try/catch block, allowing the package execution to proceed without error.  In theory, because I’m trapping any exception thrown by the LockOneForWrite() method, my script task should succeed even when there is no LeafLevelConfigSetting variable.

    The Problem

    The package runs fine when executed as part of the larger ETL framework; the LeafLevelConfigSetting variable is supplied by one of the ancestor packages, and the script task succeeds in updating that value.  However, if I execute the package on its own, it fails every time:

    i2

    The Execution Log reads as follows:

    Error: Failed to lock variable "LeafLevelConfigSetting" for read/write access with error 0xC0010001 "The variable cannot be found. This occurs when an attempt is made to retrieve a variable from the Variables collection on a container during execution of the package, and the variable is not there. The variable name may have changed or the variable is not being created.".

    Huh?  That function call was within the try/catch block and should have been trapped within the script task without causing a failure of that task.  Now I’ve never claimed to be a hard core programmer, so this revelation led me to reexamine what I thought I knew about try/catch blocks within the SSIS script task. So I created a small script that I knew would fail: the following snippet causes a divide by zero error, which should be handled in the try/catch block:

    i3

    Unlike the previous script task, this one succeeded:

    i4

    As shown in this trivial example the try/catch block does behave as expected; exceptions within the try{} block are properly caught and do not fail the task.  So the unexpected behavior described here appears to limited to the LockOneForWrite() method (as well as its close relative LockOneForRead(), as I discovered during further testing).

     

    The Solution (sort of….)

    Now for my specific application, I was able to overcome this problem by accessing the Dts.VariableDispenser.Contains() to verify the existence of the variable I want to modify, as shown below:

    i5

     

    … but even though I can now move on from this problem, I’m still left with no explanation as to why I’m unable to catch this exception.  I’m curious if anyone else has encountered this problem, and if so, if there’s an explanation to why the exception thrown by this particular function cannot be caught by a try/catch block.