Creating efficient Power BI Data Visualizations using Snowflake

Venu Chakrapani Nair
8 min readFeb 24, 2021

--

In this blog post, I will cover the optimal process and some best practices for connecting Power BI to Snowflake and extracting data for visualizations. In subsequent posts I will dig deeper into some approaches to further optimize how Power BI works with Snowflake through data modeling techniques in both Power BI and Snowflake.

Snowflake

Snowflake is a cloud-based data platform that can run on a multi-cloud model. The platform works on a pay as you use model, and the customer is only charged when either the storage or compute associated with the data warehouse is used. Snowflake can be configured to grow or shrink based on user or system workloads. There are several other features that come with the Snowflake Data Platform including Data Sharing, Zero-copy cloning, Snowpipe (for data streaming) and support for semi-structured data to name a few. For more details on the full capabilities of Snowflake please browse to this link .

Power BI

Power BI is one of the leading data visualization platforms in the market from Microsoft Corporation. Power BI offers an easy-to-use free desktop modeling tool called Power BI Desktop. Power BI also has a cloud service where the developed dashboard or visualization can be published and shared across the organization. There are various pricing options for the Power BI service. For more details on the Power BI product please browse to this link.

High Level Architecture

Here’s a rather common high level architecture pattern for building a Power BI based data visualization solution on top of a Snowflake Data Warehouse.

Connecting to Snowflake from Power BI

  1. Launch Power BI desktop

2. Use “Get Data” and select the Snowflake Connector. Make sure you use the native Power BI Snowflake connector and not the ODBC connector.

3. Enter the details of the connection

The mandatory fields here are the server and the virtual warehouse names. Ideally you would have dedicated a Snowflake Warehouse for Power BI and other analytical and data visualization purposes.

While setting up these initial details you can pick “Import” or “Direct Query”. More details on these options are covered in the next section. It is better to pick “Direct Query”, as once “Import” is picked you cannot change back to “Direct Query”.

4. Enter username and password: You can pick either a Snowflake(database) account or Microsoft account. To choose the Microsoft account, SSO single sign on must be setup using the company’s active directory(AD). Enter all the details requested.

Enter your credentials, these will only be asked once. For future connections to Snowflake these credentials are stored away safely in Power BI, and you will not be prompted for this.

There are two options for logging in. First one is using a Snowflake provided logon (database logon). For this you will need to enter the login name and the password as created in Snowflake. You can get this information from the Snowflake account administrator.

If you are configured to use a Single Sign on configuration (SSO), then use the Microsoft Account. This will automatically authenticate you using your MS Office 365 account via Active Directory. There is separate configuration and setup that needs to happen in Snowflake to enable this. This is generally considered best practice as, the authentication and access are centrally controlled and managed versus in the Snowflake database (which can get siloed).

Once connected you will see the list of databases and tables available in that Snowflake connection.

5. Select the database, schema and the tables required for the project

Extract and Transform the Data

6. Select “Load” or “Transform data”

In a classic ELT type of scenario this is where you can do any data transformations or adding derived/computed columns before using the data to build out visualizations.

7. Data is ready for building Power BI Visualizations

The selected tables are pulled into the Power BI model.

You can do additional transformation or modeling here in the true spirit of ELT(Extract , Load & Transform).

Visualize the Snowflake Data

8. Now you can create a visualization

Publish and Share the Data Viz via the Power BI Service

9. Publish the data visualization to your Power BI Service (for this you will need a Power BI Pro license)

10. Once published, you will need to login to the Power BI service and update the credentials on the data set. The report will not work if this isn’t done, especially if you are using a database logon versus the Active Directory based SSO.

Update the credentials:

Once the credentials are updated in Power BI service, the visualization will be available to view in the Cloud Power BI Service.

Import Mode, DirectQuery mode and Composite Models

It is important to consider how the Power BI service stores the data from Snowflake. There are two key choices for this:

  1. Import Mode: Effectively copying all the data from the data source into the Power BI service

Pros: Here the reports are fast, and all the rich Power BI functionality is available to use on this data.

Cons: There are limits on the amount of data you can upload into your Power BI workspace. In the case of a Pro license this is 10 Gigs. Also, the data will be static unless refreshed form the data source (Snowflake).

For example, in this case all the tables are in Import Mode:

Once the storage mode for a table is set to import mode, this cannot be changed.

2. Direct Query Mode: In this mode, the data resides in Snowflake. Power BI queries Snowflake every time the user accesses this report.

Pros: The data is always live, no need to refresh. Since there is no actual data stored in the Power BI service, there is no limit to the number of reports that can be published.

Cons: Queries will take a longer time to refresh. They can be slower to return the results. Certain DAX functions may not be available in this mode.

In this mode, you can see the direct query that Power BI is constructing to issue to Snowflake, by using the Snowflake Query history option. You can optimize and tweak this query to improve query performance. Where necessary performance of large data set results can be improved by manipulating the warehouse sizes (XS, SM, Large, XL,2XL, 3XL,4XL).

3. Composite Model: Some tables in import mode and some tables in direct query mode. Best practice for this mode is:

Dimension tables: Import mode, especially for the dimensions that are less frequently refreshed and where the data is of a manageable size and fits in the Power BI service storage constraints

Facts: Direct query mode

Pros: For tables that are in import mode, you will get good performance.

Where there is heavy lifting involved, as in the case of the fact tables that can range from the millions to billions of records, through the direct query mode the Snowflake Multi-cluster Virtual Warehouse architecture can be effectively leveraged.

Cons: Use this option with discretion, as the model can get complex and harder to maintain with multiple storage modes.

Best Practices for using Power BI with Snowflake

  • Make sure the data visualizations are use case driven, this includes the timing of the data, number of visualizations and other components.
  • Data Model Design: How you design your data model will have a major impact on query performance in Power BI. Data Modeling Hygiene is really important for good query performance. Ideally, use a Star Schema Design for your data in Power BI with Facts and Dimensions. It is also important to have a more relational, normalized Data Vault type of an ODS model in Snowflake to get the best performance in Power BI
  • Use Import mode for dimension tables and Direct query mode for Fact tables.
  • Use Composite mode sparingly.
  • Use aggregations in Power BI for pre-aggregated data to get better query performance
  • Keep the dashboard or visualization simple by limiting the number of data points, visuals and queries in a page to a minimum. This will also help query performance as there will be lesser number of queries to run while refreshing the visualization. The reports will be more easily readable and faster.
  • Use Query reduction to limit the number of queries generated. This is especially helpful when using slicers in your visualization, where you only want the filters applied when the “Apply” button is used.
  • To make modeling easier, use the “Assume referential integrity” property on relationships. While the default property on a relationship in Power BI is to generate a left outer join, by using the “Assume referential integrity” property, you can force an inner join. This can make the queries faster. This property is available only in a Direct Query mode.
  • Use bi-directional filter on relationships with discretion. More bidirectional filters mean you will generate more SQL queries. These can increase the complexity of the model and increase compute costs in Snowflake.
  • For data sets that are refreshed on a frequent basis, for e.g., hourly, daily make use of the Imported data storage mode in Power BI as this will be much faster for the end user
  • Try to pull in the smallest set of data that is possible into Power BI to ensure better performance
  • Pruning micro partitions on large data sets in Snowflake, creating additional tables with larger data with alternate sets of keys driven by the end users ad-hoc query needs can result in better performance

Hope this was helpful. Please let me know if you have any feedback.

--

--

Venu Chakrapani Nair
Venu Chakrapani Nair

Written by Venu Chakrapani Nair

Data and Analytics Enthusiast/Technology Buff/Lifelong Learner (https://linkedin.com/in/venucnair)

Responses (1)