SCD Type 2, step by step:
Type 2 (historical attribute): when we need to maintain the history of
records, whenever some particular column value changes.
By considering Type2, I assume that
the employee gets promotion, Job Title changes. In such case we need to
maintain the history of the employee, that with which designation or job Title
he joined, and when his designation or Job Title changed.
For making such scenario, I have
created two tables; one is 'tmpPerson', which is created as follows by using
Adventure Works database’s tables:
---Start-TmpPerson table ---
CREATE TABLE tmpPerson (EmpId int, Title varchar(10) NULL,FirstName varchar(20),MiddleName varchar(20),LastName varchar(20),JobTitle varchar(100),BirthDate Date,MaritalStatus char(1),Genderchar(1),NationalIDNumber varchar(20),CurrentFlag int , StartDate date,EndDate date DEFAULT ('2050-12-31'))
INSERT INTO tmpPerson (EmpId, Title,FirstName, MiddleName,LastName, JobTitle,BirthDate,MaritalStatus,Gender,NationalIDNumber,CurrentFlag, StartDate)
SELECT e.BusinessEntityID, per.Title,
per.FirstName,
per.MiddleName,
per.LastName,
e.JobTitle,
e.BirthDate,
e.MaritalStatus,
e.Gender,
e.NationalIDNumber,
e.CurrentFlag,
e.ModifiedDate
FROM HumanResources.Employee e
INNER JOIN Person.Person per one.BusinessEntityID=per.BusinessEntityID
----End –TmpPerson table----
I have also created one table
tmpPersonStage similar to tmpperson, which stores the records which are changed
by changing any of the column value. For Type 2, I have made some changes in
Job Title, as below:
---Start--tmpPersonStage table----
CREATE TABLE tmpPersonstage (EmpId int, Title varchar(10) NULL,FirstNamevarchar(20), MiddleName varchar(20),LastName varchar(20),JobTitle varchar(100),BirthDate Date,MaritalStatus char(1),Genderchar(1),NationalIDNumber varchar(20),CurrentFlag int , StartDate date,EndDate date DEFAULT ('2050-12-31'))
INSERT INTO tmpPersonstage (EmpId, Title,FirstName, MiddleName,LastName, JobTitle,BirthDate,MaritalStatus,Gender,NationalIDNumber,CurrentFlag, StartDate,EndDate)
SELECT EmpId,
Title,
FirstName,
MiddleName,
LastName,
JobTitle,
BirthDate,
MaritalStatus,
Gender,
NationalIDNumber,
CurrentFlag,
StartDate,
EndDate
FROM dbo.tmpperson
WHERE empid IN (3,
6,
8,
58,
49) ----changes for type 2
UPDATE tmpPersonstage
SET JobTitle ='Design Engineer' WHERE EmpId=3
UPDATE tmpPersonstage
SET JobTitle ='Production Technician - WC10' WHERE EmpId=6
UPDATE tmpPersonstage
SET JobTitle ='Production Technician - WC50' WHERE EmpId=8
UPDATE tmpPersonstage
SET JobTitle ='Research and Development Engineer' whereEmpId=49
UPDATE tmpPersonstage
SET JobTitle ='Engineering Manager',
LastName='Keill' whereEmpId=58 ---End---tmpPersonstage table -------
In the above script, I have updated
the JobTitle of some employees, for which initially the job titles in tmpPerson
table were as below
After preparing the initial steps for
the scenario, let’s move to SSIS package to apply SCD transformation. The below
steps are the implementation of SCD Type 2
Step 1:
Drag and drop one Data Flow
Task in the Control flow tabs.
And name this DFT as ‘scd type 2’
Step2:
Go to Data Flow tab, Drag and drop
OleDB Source and Slowly Changing Dimension transformation from Data flow
transformations.
Step 3:
Configure OleDB source, here the source will be the table which has
changed or new records, and for which history will be maintained, or inserted
in the main table. In our scenario the source table is ‘tmpPersonStage’, which
keeps some updated and new records for updated records history will be
maintained and new records will be inserted into the main table ‘tmpPerson’
Step 4:
Configure the SCD transformation; double click on SCD transformation, one
wizard will open as below:
Click on next and create either new
connection or already created connection to AdventureWorks2008, and while
mapping the input columns to dimension column, at least one column need to
be mentioned as ‘Business Key’.
Business key is the column on the basis of which the updation or
insertion into ‘tmpPerson’ will be made. Here we have mentioned EMPID as
BusinessKey, so if EmpId of input table ‘tmpPersonStage’ is found
in ‘tmpPerson’table then the respective row in ‘tmpPerson’
will be updated with the changes from ‘tmpPersonStage’ input
table
Click on next. In the below window, change type need to be mentioned on
certain column. Change Type can be ‘Fixed attribute’, ‘Changing attribute’ and
‘Historical Attribute’
As in this scenario is based
on Historical attribute we will select some columns for which history will be
maintained if found modified and new record will be inserted.
Go to Next and select "Use single column to show current expire records" option
As in ‘tmpPerson’, we have both the
option available one through ‘CurrentFlag’ column and another through
‘Startdate’ and ‘EndDate’.
We can opt for first option and use
column ‘CurrentFlag’ as the indicator of the current record. By this the
updated record will be inserted with ‘CurrentFlag’ value as ‘1’ and the records
which became old for the respective ‘EmpID’ the ‘CurrentFlag’ will be set to
‘0’.
And if we opt for second option then
we need to update the old record’s EndDate with ‘GetDate ()’ and need to insert
the updated record with ‘StartDate’ as ‘GetDate ()’. For new records, we need
to use StartDate as ‘GetDate ()’ .
Through this post, I am
selecting first option, and I will be using ‘CurrentFlag’ Column to show the
current record.
Go to next and don't select
‘Enable inferred member support’, as we don’t require it for this scenario.
Click next, which displays New
Records, other Outputs(Historical Attribute Output), which means there will be
two arrows from SCD, one will take the new records, another will take the
records for which history needs to be maintained.
Click next and Finish the wizard. As
we finish the wizard we see some transformation like ‘OleDb command’ and
‘Insert Destination’ are automatically created.
OLEDB command will update the
‘CurrentFlag’ column of old records and will set it to ‘0’ for the ‘EmpID’ for
which records are updated. These all will be union all with all new records
along with the new updated records having ‘CurrentFlag’ 1 and will be inserted
into the main table ‘tmpPerson’
Step 5:
Step 5.a:
Double click on Derived Column, and
we see that one column with ‘Replace Currentflag’ has been derived from current
flag and the value is set to ‘0’.
Step 5.b:
Double click on Oledb command, and it
will be configured automatically as below. The connection will be automatically
configured to table ‘tmpPerson’ of Adventure works 2008, in which the records
will be updated.
Also when clicked on Component
Properties tab then the update command is also configure, which we can see on
string value editor.
The update command updates Current
Flag to 0 on the basis of EmpId , and Current Flag(value 1)
In column mapping tab the parameter
are mapped with the columns mentioned in query. These Parameters contain the
Replace Current Flag column from derived transformation and EmpID from SCD
Step 5.c:
When clicked on next Derived Column
transformation, the same Current Flag column is derived column of Replace
Current flag, and set it to 1, for new and updated records.
Step 5.d:
Insert Destination will also be
configured automatically, to insert the new records coming from SCD to
tmpPerson table.
Step 7:
Execute the DFT
Execute the DFT
After execution, we can see that, we
updated 5 records in the tmppersonstage table, which are inserted into
tmpperson table, along with the old records current flag updated as 0. As there
was no new records, so no row is inserted into tmpperson.
I hope my effort of making you aware on 'SCD Type 2, step by step' helped you in your journey of SSIS. If you feel any queries, please post them as comments below. Also, your feedback will be well appreciated as it helps us to improve better and better every time.
No comments:
Post a Comment