How to Read a File With Multiple Data Formats in Ssis
Some time back I wrote about how to use the script component to parse out ragged data files in SSIS. In this post, I'll go on the discussion to depict how to handle mixed format data files – specifically, those with several different record types in each file.
In a perfect world, a apartment file will contain a single record type. However, my experience in healthcare several years dorsum, as well every bit subsequent projects every bit a consultant, have taught me that there are many antipatterns effectually mixing file formats in a single information file. Combining several different record types into one file eliminates a small bit of work by eliminating multiple information files, but also adds complexity to those systems consuming the resulting hybrid files. In this post I'll show an example of this, and I'll demonstrate a design design to handle this type of data formatting.
Handling Mixed Format Data Files in SSIS
Consider the example of the following mixed format data file, which contains a combination of patient hospital visit records too every bit procedures undertaken in each of those visits.
Every bit shown, there are two dissimilar types of data represented hither. The longer records, marked with a 5 in the first cavalcade, appear to have hospital visit information. Subsequent records, indicated by a P, show medical procedures related to those visits. Since the shape and purpose of these types of information are so different, it's a nearly certainty that the contents of this file should be sent to two different outputs.
For older versions of SSIS – 2008 and 2008 – this would be problematic. The flat file source in those older versions was hard-wired to look for exactly the number of columns defined in the flat file connection manager, even if it meant reading data from the next line in the file to satisfy the number of expected columns. Fortunately, this behavior was fixed in SSIS 2012, in which any missing columns are but filled in with NULLs. This change in behavior makes it possible to use native components (read: no scripting) to handle a file similar to the 1 above.
To process this file in SSIS, the flat file connection managing director will be configured with enough column metadata to handle the file format with the greatest number of columns. In the example in a higher place, the V record (visit) has 11 columns, while the P record (process) has only five. Therefore, the flat file connectedness director should be configured with eleven columns, with the data type of each column fix to handle the largest or broadest data type in whatsoever row type for that column. Every bit shown below, I've configured a flat file connection director with 11 generically named columns, which volition be mapped to the appropriate output tables and columns further downstream.
Using the configuration above, all eleven columns will be populated by the visit records; the procedures record type volition load only the kickoff five columns, leaving Cypher values in the rest. When connecting to this flat file connection director in the SSIS data flow using a flat file source, both tape types will be loaded into the data pipeline. Once that is in place, separating the records is equally piece of cake as using a conditional split based on the first column (Five for visit, P for process).
As shown, Visit records (those matching [Column0] == "V") are sent to the Visits output, while all others are sent to the default output named Procedures. Each output can then be mapped to its corresponding target table.
Finally, for each target table, the generic cavalcade names must be manually mapped to the corresponding output columns in the target table.
It bears mentioning again that this mixed format data file processing pattern works merely in SSIS 2012 and 2014. To accomplish this in older versions, the scripting method I described in my earlier mail service would be used.
Source: https://www.timmitchell.net/post/2015/04/13/handling-mixed-format-data-files-in-ssis/
0 Response to "How to Read a File With Multiple Data Formats in Ssis"
Post a Comment