data:image/s3,"s3://crabby-images/95b64/95b64b692d0747c8c6ce4c781b73bddbbe2d798e" alt=""
slowly changing dimensions (SCD) is an amazing data warehousing concept. Ralph KimballThe concept of SCD deals with moving a particular set of data from one state to another.Please try to imagine human resources (human resources) with employee table. As the following image shows, Stephen Jean is sales manager His team has 10 sales representatives:
data:image/s3,"s3://crabby-images/38642/386421e522a52efa39bde5cc6a65c6b9031fb8d9" alt="Power BI SCD Stephen Jiang is a sales manager on a team of 10 sales reps."
today, Stephen Jean was promoted to Vice President of Sales Therefore, the size of his team is Ten To 17Stephen is the same person, but with changed roles as shown in the following figure.
data:image/s3,"s3://crabby-images/770b7/770b72d6c851fdf3b81844924eb6b127e7618416" alt="Power BI's SCD, Stephen's team after being promoted to VP of Sales"
Another example is when a customer’s address changes in a sales system. Again, same customer but different address. From a data warehousing perspective, there are different options for processing data depending on business requirements, leading to different types of his SDC. It’s important to note that the data is changed in the transactional source system (in this example, the HR system or the sales system). Move and transform data from transactional systems. ETL (eextract, T.runform, and L.This is where the concept of SCD begins. SCD is about how changes in the source system reflect data in the data warehouse. This terminology is slowly changing as this kind of change in the source system does not occur very often. Many SCD types have been developed over the years, which is beyond the scope of this article, but for reference, the first three of his types are covered below.
SCD Type Zero (SCD 0)
This type of SCD ignores all changes in dimensions. Therefore, if a person’s address changes in the source system (HR system in this example), the landing dimension in the data warehouse will not change. changes in the data source are ignored. SCD 0 is also known as SCD 0. Fixed dimension.
SCD Type 1 (SCD 1)
The SCD 1 type overwrites old data with new data. A good example of the SCD 1 type is when a business does not need the customer’s old address, only the customer’s current address.
SCD Type 2 (SCD 2)
This type of SCD keeps a history of data changes in the data warehouse when the business needs to keep old and current data. In the SCD 2 scenario, we need to maintain historical data, so we insert a new row of data into the data warehouse every time the transactional system changes. Transaction system changes are either:
- Insert: when a new row is inserted into the table
- Update: when an existing row of data is updated with new data
- Delete: when a row of data is deleted from a table
Let’s continue with the previous HR system example. employee table.inserting new rows of data into employee A dimension in the data warehouse for every change in the source system causes data duplication in the system. employee A dimension of the data warehouse.so you can’t use employee key Use the column as the primary key of the dimension. Therefore, we need to introduce a new set of columns to guarantee uniqueness for every row of data, like this:
- New key column to guarantee row uniqueness employee size. This new key column is a simple index representing each row of data stored in the data warehouse dimension.The new key is called surrogate keyA surrogate key ensures that each row in a dimension is unique, but must maintain the primary key in the source system.The definition now calls the source system’s primary key business key Also alternate key In the world of data warehousing.
- a start date When End date The columns represent the time period during which the data rows are in their current state.
- Another column shows the status of each row of data.
SCD 2 is the most common type of SCD.After creating the required columns
Let’s look at the scenario again Stephen Jean promoted from sales manager To Vice President of Sales. The following screenshot shows employee Dimensions of the data warehouse before Stephen’s promotion:
data:image/s3,"s3://crabby-images/ca852/ca852e1d7128cc4acd59677d68e701139d42cd23" alt="SCD in Power BI, employee data before Steven was promoted"
of employee key column is surrogate key dimensional Employee Business Key The column is the business key (customer primary key in the source system).of start date column shows the date Stephen Jean started working as North America Sales Manager, End date column is left blank (null) and situation column show At the present timeNow let’s look at the data after Stephen’s promotion. This is shown in the screenshot below.
data:image/s3,"s3://crabby-images/1657a/1657a68128dce4271550dfb140312ff06b73da22" alt="SCD in Power BI, employee data after Stephen gets promoted"
As the image above shows, Stephen Jean started his new role as Vice President of Sales upon 2012/10/13 finished his work as North America Sales Manager upon December 10, 2012Therefore, data is transformed as it moves from the source system to the data warehouse. As you can see, handling SCD is one of the most important tasks. ETL process.
Let’s take a look at what SCD 2 means when it comes to data modeling in Power BI. First question is: Can I implement SCD 2 directly in Power BI Desktop without a data warehouse? To answer this question, you should remember to always prepare your data before loading it into your model. On the other hand, when building data models in Power BI, you create a semantic layer. In previous posts, we discussed various components of a BI solution. ETL and the semantic layerBut I repeat here. Power BI solutions use Power Query to handle the ETL process, and the data model is the semantic layer. The semantic layer is by definition a view of the source data (usually a data warehouse) optimized for reporting and analysis purposes. The semantic layer does not replace the data warehouse, nor is it another version of the data warehouse. The answer is that SCD 2 functionality cannot be implemented purely in Power BI. You should either have a data warehouse to hold historical data, or your transactional system should have mechanisms to support maintaining historical data, such as temporal mechanisms. A temporary mechanism is a feature provided by some relational database management systems, such as SQL Server, that provides information about the data held in a table at any time, rather than keeping only the current data.Click here for details temporal table In SQL Server, check this out.
When you load data into a Data Model in Power BI Desktop, dimension tables contain all current and historical data. Therefore, care should be taken when handling SCDs. For example, the screenshot below shows the employee’s reseller sales.
data:image/s3,"s3://crabby-images/061db/061dbfcfe022d81c785ad499c387ae01745d036e" alt="SCD in Power BI, SCD in Power BI, reseller sales of employees without considering SCD"
At first glance, the numbers look correct. Well they might be right. they may be wrong. It depends on what the business expects from the report.look Image 4, It shows the change in Stephen.Steven said that he North America Sales Manager (employee key 272). However, after his promotion (employee key 277), he no longer sells. He didn’t take SCD into account when he created the previous table. That is, it considers Stephen’s sales value (employee key 272).But is this what the business needs? Does the business expect to see sales for every employee without considering employee status? let’s add situation column to table.
data:image/s3,"s3://crabby-images/0b8d7/0b8d7ff4f56701ab6362cfc513d86a19e8ff62fe" alt="SCD in Power BI, employee reseller sales and their status without considering SCD"
employee status is At the present time? In that case, we need to factor the SCD into the equation and exclude Stephen’s sales value. You may need to add more depending on your business requirements. situation Use the column as a filter in your visualization, but otherwise you may need to add it and change the measure. start date, End dateWhen situation A column for filtering the results. The following screenshot shows the result of extracting Stephen’s sales using visual filters.
data:image/s3,"s3://crabby-images/adc6a/adc6a995cc525d2fed62f34d51e5f37c27982787" alt="SCD in Power BI, SCD in Power BI, reseller sales for employees considering SCD"
Dealing with SCDs is not always so straightforward. Sometimes you need to make changes to your data model.
So does all of the above mean that you can’t implement any type of SCD in Power BI? The answer, as always, is “it depends”. In some scenarios you can implement a solution like this: SCD1 I will discuss this in another blog post. But we are out of luck, SCD2 A purely Power BI feature.
Have you ever used SCD with Power BI? I would like to know about the challenges you faced. Share your thoughts in the comments section below.