Snowflake Data Warehouse in SAP Ecosystem
Part-1: SAP Data Services
Due to the increasing popularity of the cloud-based Snowflake Elastic Data Warehouse, several IT organizations are migrating their on-premise data warehouse into Snowflake.
What is Snowflake?
Snowflake is a cloud-based data platform that runs on Azure, AWS or GCP. Snowflake as a cloud data warehouse has several benefits including:
1. Cost Savings via cloud compute and storage options
2. Truly elastic database: Ability to size the compute on demand and unlimited storage
3. A high level of security on the cloud and adherence to all industry security standards (HIPAA, SOC1/2, PCI, FedRAMP, etc.)
4. Separation of Compute and Storage, unlike other MPP competitors
5. Cloud Provider Agnostic: Runs on all the cloud providers including AWS, Azure and GCP
6. Availability of external data sharing and data exchange capabilities.
More details on the platform and benefits can be found at: www.snowflake.com
For organizations that have acquired Snowflake and have a significant SAP presence across the operational and analytic landscapes, including S/4 HANA (ERP), SAP BODS, SAP Analytics Cloud, it is important to be able to leverage existing investments in these technologies and continue to work with the new Snowflake cloud data platform. Due to the complexity of the DW options in SAP like BW and the costs of the HANA appliance, Snowflake offers a great alternative for Data Lakes and Data Warehouses.
Snowflake can play a key role here in democratizing the rich data created and managed in SAP and other data sources while using existing investments in SAP data tools.
There are a few different design patterns available for organizations:
1. Perform real-time replication of SAP base tables using a partner tool (examples are HVR and Attunity, etc.)
2. Leverage SAP logic in the transaction system (e.g. by connecting ETL tools at the application layer (classic ETL tools like SAP Data Services, Informatica, Talend, etc.).
3. Leverage investments (time, money, business logic) made in SAP BW by continuing the data flow from BW into Snowflake
In this article I will look at some approaches to batch load on-premise data (from any SAP or non-SAP data source) into Snowflake using SAP Data Services (BODS). Other approaches to load data from SAP Data Sources will be covered in subsequent articles.
End-to-end Batch ETL into Snowflake using SAP BODS
Data Services/BODS can be used to orchestrate the end-to-end process of extracting, transforming, and loading the data into Snowflake.
Snowflake has both a JDBC and an ODBC driver that can be installed on the BODS application server to facilitate the connection.
This approach will require the latest ODBC or JDBC driver to be installed on the SAP Data Services server and configured appropriately. Once configured, the database, tables and other objects can be pulled into a BODS Data Flow as data source or target objects. SAP BODS can be used for all the traditional data extraction, transformation and load processes. Organizations that have been using SAP BODS for ETL and have a significant knowledge base in this tool, can use it to the fullest extent to get the data to the ideal format for loading into the Snowflake Data Warehouse.
More details can be found here:
Recommended Approach: ODBC Connectivity to Snowflake: https://help.sap.com/viewer/af6d8e979d0f40c49175007e486257f0/4.2.13/en-US/bdfc176728dc466bbaf5012a9e3793bc.html?q=snowflake
Also available JDBC Connectivity to Snowflake: https://blogs.sap.com/2015/03/04/sap-data-services-and-jdbc/
Using BODS for ET(Extract/Transform) and Snowflake (Snowsql, Snowpipe) for Loading
Here, BODS can be leveraged for all the ET processing and as an overall orchestrator of the data movement process. This approach leverages the strengths of both the technology platforms (SAP & Snowflake) in the ETL process. This is ideal for organizations who prefer to bypass the ODBC/JDBC processes and rely on the native connectors.
BODS can be used for:
· Data Ingestion from SAP (and non-SAP) Data Sources
· Data Transformation using its extensive library of transforms
The transformed data can then be written to a file share (internal or cloud).
Snowflake data ingestion/load scripts can be triggered from within BODS using a combination of the “script” (push) component and Snowsql or can be initiated(pull) from within Snowflake on a trigger, like arrival of a file.
Trigger based file pulls can be enabled in Snowflake using a combination of tools services including the serverless Snowflake service called Snowpipe and a cloud native event notification service like SNS(AWS).
The Snowflake data load/copy trigger on arrival can be setup using either the Snowpipe service or using a python script that is constantly interrogating the file share directories for file arrivals.