[How To] Remove Duplicate Rows in Informatica

This tutorial explains how we can remove duplicate rows from Source data and load into flat file.

We would be using a Sorter, an Expression transformation and a Filter to do this. We would first need to sort the data so as to bring all the duplicates in a sequential order. In Expression transformation, we compare the current row with the previous row and each time we find a duplicate row, we set the Filter_Flag as 'Y'. We then filter the rows where Filter_Flag='Y' and send the unique rows to target file.

::Mapping:: 

::Sorter::

::Expression to flag duplicates::

::Filter to remove duplicates::

Note: In our example, we have determined unique rows based on Student_Name column; alternately we can make all the columns as unique. This can be done by simply setting all the keys as sorter key and then concatenating and comparing all the current columns with previous ones.