Wednesday, 28 November 2018

Handling bad rows (inconsistent column numbers) in SSIS flat file source - Sandeep Kanao

Handling bad rows (inconsistent column numbers) in SSIS flat file source - Sandeep Kanao


Problem Definition :

Load n-column CSV file into a database. Once in a while the file has a "less then n-columns" in a few rows.
We need to ignore/record these 'bad' rows and load data from 'good' rows.

Solution :

1. Add a Flat file connection manager, Select the text file, having only one Column



2. Change the datatype of the column to DT_WSTR and length = 4000

3. Add a DataFlow Task - Sandeep Kanao





4. Inside the Data Flow Task add a Flat File Source, Script Component and OLEDB Destination

5. In the Script Component Select Column0 as Input Column


6. Add other Output Columns (the optimal output columns)

7. Change the OutputBuffer SynchronousInput property to None



8. Open script and add the following script

string fName = @"C:\input.csv" // Full file path: it should reference via variable

string[] lines = System.IO.File.ReadAllLines(fName);

//add a counter
int ctr = 1;

foreach(string line in lines)
{
    string[] cols = line.Split(',');

    if(ctr!=1) //Assumes Header row. Remove if 1st row has data
    {
    if(cols.Length == 17)
    {
          //Write out to Output
          Output0Buffer.AddRow();
          Output0Buffer.Col1 = cols[0].ToString(); //You need to cast to data type
          Output0Buffer.Col2 = int.Parse(cols[1]) // example to cast to int
         
          ... //rest of Columns
    }
    // else - handle skipped lines
    //else 
    // write out line somewhere
    }
    ctr++; //increment counter
}

9. Map the Output Columns to the Destination Columns

Loading CSV file in SQL Server using SSIS when data is not formatted - Sandeep Kanao

Loading CSV file in SQL Server using SSIS when data is not formatted  - Sandeep Kanao

Problem Definition

File Format is comma(,)delimiter and some data with double quotes.

Input File format:
ID,Name,Address,salary
id=1,name="hari",address="ABC",salary="5000"
id=2,name="mahesh",address="ABC,BCD",salary="5000"

Destination table :
ID Name      Address Salary
1 hari          ABC             5000
2 Mahesh      ABC,BCD 5000

Try to load data with text qualifier double quotes(") it will not be loaded correctly.

Suggested Solution :

In above example, structure/pattern of data are similar, so we could use Script Component as Source to extract data from flat file.

1. Create output columns and give proper data type and length. Personally, I would suggest setting up data type  of all columns to string.

2. Use following script in SSIS

 public override void CreateOutputRows()
    {
       
       
        string[] lines = File.ReadAllLines(@"input.txt");

        for (int i = 1; i < lines.Length; i++)
        {
            int pFrom1 = lines[i].IndexOf("id=") + "id=".Length;
            int pTo1 = lines[i].LastIndexOf(",name");

            int pFrom2 = lines[i].IndexOf("name=\"") + "name=\"".Length;
            int pTo2 = lines[i].LastIndexOf("\",address");

            int pFrom3 = lines[i].IndexOf("address=\"") + "address=\"".Length;
            int pTo3 = lines[i].LastIndexOf("\",salary");

            int pFrom4 = lines[i].IndexOf("salary=\"") + "salary=\"".Length;

            Output0Buffer.AddRow();

            Output0Buffer.ID= lines[i].Substring(pFrom1, pTo1 - pFrom1);

            Output0Buffer.Name= lines[i].Substring(pFrom2, pTo2 - pFrom2);

            Output0Buffer.Address = lines[i].Substring(pFrom3, pTo3 - pFrom3);

            Output0Buffer.Salary = lines[i].Substring(pFrom4, lines.Length+1);

        }
    }

3. Run SSIS package, check the results - Sandeep Kanao



4. Convert the data type in destination table

Sunday, 11 November 2018

Deep-Learning-Tensorflow-Urban-Sound-Classification---Sandeep-Kanao


Deep-Learning-Tensorflow-Urban-Sound-Classification---Sandeep-Kanao


Deep-Learning Tensorflow Urban Sound Classification - Sandeep Kanao
Problem Statement When you start your machine learning journey, you go with simple machine learning problems like titanic survival prediction or digit recognition. But you still don't have enough practice when it comes to real life problems. To give you a taste of one such problem, we present you "Urban Sound Classification". This practice problem is meant to introduce you to audio processing in the usual classification scenario.
Data This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, and street_music.