In this series of blog posts, we will be examining some of the key features of Power BI Desktop, Power Query and the Power BI Service needed to pass exam DA-100: Analyzing Data with Microsoft Power BI.
In this Power BI training post, we will examine DA-100 exam – Manage Datasets: Identify Downstream Dataset Dependencies.
This topic refers to two related features which are found in the Power BI service: lineage view and impact analysis.
In the Power BI service, when you are looking at the contents of any workspace, the normal view, the flat view if you like, is List View. Here, you see the name of each item, and you have a series of columns, which give you further information about each item.
However, what this view does not give us is any idea of the relationship between the various items, the various artefacts within the workspace. Within the workspace, we have datasets; we build reports from those datasets; and then, typically, we create dashboards, which also have dependencies on those reports.
In order to keep track of these dependencies, and to get a good idea of the impact that any changes you might make, will have on other related items, this is where switching to lineage view becomes useful. To do that, we just choose lineage from the view drop-down menu.
In order to work in lineage view, you need to have write access to the workspace in question, and this means that of the four roles available for access, you need to have Admin, Member, or Contributor access.
Here is the same workspace as seen by someone who only has Viewer access to the workspace, and you will notice that there is no possibility of switching to lineage view.
This is because the Viewer role only gives you access to content, it does not give you access to the underlying datasets. So, lineage view would not really make any sense in that context.
In lineage view, content is shown in related clusters.
In this first cluster, we can see that we have a dataset, which is made up of three components, a CSV file, tables or views brought in from SQL Server, and then some data brought in from a web source.
We can also see that this dataset has been shared. So, it is possible for users to build reports on this dataset and upload them into other workspaces. Then we see the report that is been built from that dataset, and then finally, a series of dashboards, which all have dependencies on that sales report.
In the bottom right of each of the items in the cluster, we have a button marked show lineage. When we click on this, we are given a very clear overview of the dependencies on the selected item.
In this example, we can see that selected data source is used in the dataset, it is used in the report, and it is used in the various dashboards as well.
The above cluster is entirely located within the current workspace, the UK Sales workspace. If we move down to the next cluster, we have an example of a dataset, which is coming from another workspace, and this is referred to as an external dataset, and you will see it has its own icon when we hover over the icon, that tooltip tells us that this is an external dataset, and you will notice that we have a link, which will enable us to navigate to that location.
From that external dataset, we have created four reports, and then we have created one dashboard from each of those reports.
Next, let us click on the Data Central link to navigate to the workspace that contains this dataset.
We remain in lineage view, and we are now taken across to the Data Central workspace, where our sales data dataset is located.
We can see each of the data sources on which this dataset is built. We have a couple of Excel files, a SQL Server database, and a web source.
However, that the diagram does not show us the reports that may have been built from this dataset, and this is where the Impact Analysis screen comes in.
If we click on the button which says, "Show impact across workspaces", the Impact Analysis pop-up appears, and this gives us a very good overview of how the dataset is being used within the Power BI tenant.
We have one section for each of the workspaces from which this dataset has been referenced, and at the moment, there is only one other workspace, which is the UK Sales workspace, where we started.
When we expand UK Sales, we get a list of all the reports and all the dashboards that have been built from this dataset.
When we were up and running, we will also be given some useful stats as to who is viewing the content, and how many views are coming from each of those viewers.
We also have a useful button here for sending out notifications to the contacts associated with this workspace. For example, if we were about to make impactive or potentially impacted changes to the dataset, we might want to notify people who have built reports from that dataset.
To determine who gets notified, we click on the ellipsis menu next to the name of the workspace (In this case "Data Central") and go into workspace settings.
In the advanced section, we can specify the contact list. By default, the list includes anyone who has admin access to this workspace. However, alternatively, we can choose specific users and groups and then reference either individual users or security groups.
These are the people that will be notified when we click on notify contacts.
So, that is an overview of the two areas relevant to this topic; just be aware of how to use lineage view to track dependencies on the objects within a workspace, and also to use the impact analysis feature to get an idea of how changes that you may make to your datasets could have an impact on downstream artefacts.