The FTP protocol is one of the oldest methods for sharing and moving files. Although frequently considered to be an “old-school” way to transfer data, FTP is still relied upon in most every data movement architecture. Sadly, the functionality around FTP is very limited in SQL Server Integration Services. A common project requirement is to retrieve a list of files from FTP using SSIS, but no such behavior exists in the native FTP task in Integration Services. However, there is a workaround!
In this post, I will demonstrate how to retrieve a list of files from FTP using SSIS scripting tools.
Retrieve a List of Files from FTP using SSIS
Extracting a list of files stored on an FTP server is a common need in ETL projects. A frequent example of this is needing to retrieve the most recent file (based on either the file name or the modified date) in an FTP folder. The FTP task built into SSIS is limited in functionality – in fact, it doesn’t even support SFTP or FTPS connections – and has no support for retrieving a list of files stored on an FTP server. However, as is often the case, using a bit of script in SSIS can get around this limitation.
Using WinSCP
I have previously written about using WinSCP in SSIS. It’s a very mature and well-documented tool with a liberal license agreement, making it an excellent choice for extending the FTP capabilities of Integration Services. The free download includes both the portable .exe file as well as the .dll file for accessing the secure FTP libraries directly. In my previous post on this topic, I demonstrated calling the executable using command-line arguments; for this example I will include the .dll assembly to work directly with the WinSCP object model.
For the SSIS scripting part, we could go in one of several different directions:
- Use a script task to write the list of files to an SSIS variable, and separately process the resulting list of files using another script, an SSIS foreach loop, etc.
- Include all of the logic in a single script component source, which writes the list of files on the FTP server to its output
- Build a solution based entirely in code in a script task
To keep this example simple, I will use the first approach. The recipe list for this solution will include the following:
- The WinSCP executable and .dll
- An SSIS script task
- An SSIS object variable to store the results
- A separate script to show the list of files retrieved from the FTP server (which could be extended to identify the newest file, the biggest file, or whatever the business logic dictates)
Parameterization and Test Environment
To keep things as modular and reusable as possible, I start by parameterizing the values that might change from one execution to the next. As shown below, those variable values – including the name of the server, login credentials, and the FTP folder – are externalized using SSIS package parameters.
For testing purposes, I set up a local SFTP server using CoreFTP mini FTP server, a free tool that simplifies testing SFTP processes. I also set up a local directory with a handful of files and folders in the root directory, which I will use as a test folder.
Building the Script
Next up: time to write code. In a new script task I’ll do four things:
- Create a reference to the WinSCP assembly
- Create a System.Data.DataTable object to capture the file name and modified date of each file in the specified directory
- Connect to the SFTP server, and write to the DataTable object above the name and modified date of each file
- Set the SSIS object variable to the value of the DataTable object
The first step in code is to create a simple DataTable object. Shown below, I configure this object with columns to store the file name and the modified date.
Once the DataTable object has been created, I connect to the FTP server using credentials specified in the SSIS parameters above, using the ListDirectory method to get a list of files.
Next, I use a loop to iterate over the list from above. For every file found, I’ll write a new row into the DataTable object.
Finally, I set the value of the SSIS object variable to the DataTable object containing the list of files.
Using This List
Although the above script is the most significant part of this solution, it isn’t all of it. The logic above writes the list of files into an SSIS object variable, but it is still upon the ETL developer to decide what to do with that. Among the options:
- Use a foreach loop to execute some ETL logic for each file in that list
- Using script, select the name of the desired file (biggest file, most recent, etc.) from the list of files and download that file from the FTP server
- Using a script component, parse out the file names and write them into a database for further processing or analysis
Once the list of files is stored in the SSIS object variable, you can do almost anything with that list. To simplify the demonstration, I’ll just use a simple script component to extract the file names from that SSIS object variable to demonstrate how easily this can be done.
As shown below, I set up a new SSIS script component as a source, configuring a single output. I set up two columns in this output, which match the names and data types of the DataTable used in the previous script.
The code sample below shows two operations: creating the data table object from the SSIS object variable, and looping through the rows in that DataTable to create a new row in the script component output for each row found in that list. This logic is all contained in the function CreateNewOutputRows(), which is automatically created when the SSIS script component source is used.
As a final test, I add a Union All transformation immediately after this script component, and use the data viewer on that connection to confirm that the file names and modification dates are being properly extracted.
As noted, all of the code logic could be done in a single script component (rather than breaking into two different scripts as I did for demonstration).
If you’d like to explore the code to retrieve a list of files from FTP using SSIS in more detail, you can download the entire package here. If you need to brush up on the use of the WinSCP utilities in SSIS, there is a good tutorial on that here.
Conclusion
Although there is no native functionality to retrieve a list of files from FTP using SSIS, a bit of C# script and the free WinSCP library make this a fairly easy task in Integration Services.
When using a thirdparty dll in SSIS, i think that you must register the dll into the server GAC.
Anyways nice post.