Handling bad rows (inconsistent column numbers) in SSIS flat file source - Sandeep Kanao
Problem Definition :
Load n-column CSV file into a database. Once in a while the file has a "less then n-columns" in a few rows.
We need to ignore/record these 'bad' rows and load data from 'good' rows.
Solution :
1. Add a Flat file connection manager, Select the text file, having only one Column
2. Change the datatype of the column to DT_WSTR and length = 4000
3. Add a DataFlow Task - Sandeep Kanao
4. Inside the Data Flow Task add a Flat File Source, Script Component and OLEDB Destination
5. In the Script Component Select Column0 as Input Column
6. Add other Output Columns (the optimal output columns)
7. Change the OutputBuffer SynchronousInput property to None
8. Open script and add the following script
string fName = @"C:\input.csv" // Full file path: it should reference via variable
string[] lines = System.IO.File.ReadAllLines(fName);
//add a counter
int ctr = 1;
foreach(string line in lines)
{
string[] cols = line.Split(',');
if(ctr!=1) //Assumes Header row. Remove if 1st row has data
{
if(cols.Length == 17)
{
//Write out to Output
Output0Buffer.AddRow();
Output0Buffer.Col1 = cols[0].ToString(); //You need to cast to data type
Output0Buffer.Col2 = int.Parse(cols[1]) // example to cast to int
... //rest of Columns
}
// else - handle skipped lines
//else
// write out line somewhere
}
ctr++; //increment counter
}
9. Map the Output Columns to the Destination Columns