Snowflake Data Management Overview

frost-490807_1920
Image by Gerd Altmann from Pixabay

Gartner’s 2019 Data Management Solutions for Analytics Magic Quadrant (MQ) report placed one cloud data warehouse software as one of the top – Snowflake. It ranks in the leaders’ quadrant alongside huge names like Microsoft, Google, Amazon, and Oracle! For those in data science, especially those using SQL for data retrieval, Snowflake is an exciting new tool you’ll want to know everything about going ahead.

What is Snowflake?

Snowflake is a data warehouse built entirely for the cloud with an architecture that sets it apart from the rest. It focuses on flexibility and efficiency in a way others cannot by combining the cloud with significant data flexibility and the power of data warehousing – and all at a small percentage of the cost of traditional processes.

The Architecture

There are three layers to the program that are essential in how it works. Here are the layers, what they do, and why you, as a data scientist, should be excited about them!

  1. The Storage Layer: Unsurprisingly, this is where the critical data is stored. Snowflake keeps it all on the cloud, and it is only retrievable through SQL queries using this software. This process keeps information secure and away from those who aren’t supposed to see it! Even more excitingly, it’s cloud-based function means that the layer can expand or contract in size entirely independently of the hardware resources at your disposal, keeping all your processes running smoothly.
  2. The Compute Layer: This is the layer that processes data in enormous amounts and very quickly. Virtual warehouses (compute engines) handle all of the heavy work in Snowflake – large clusters of computing resources which hold massive horsepower to get your job done right. This layer works by responding to queries by pulling the minimum amount of data from the storage layer to get you what you need accurately. It also operates on a cache system, improving efficiency for future searches. Multiple compute engine clusters work at once, making everything quicker, global, and entirely in compliance with ACID.
  3. The Services Layer: This is essentially the neural center of the operation. It takes care of user authentication, security functions, transaction coordination, and query compilation and optimization. The resources are stateless and spread out all over, giving state management in global sites and preventing loss of computing resources for every query.

Conclusion

Snowflake is an exciting way forward for data scientists. It offers extra features like time travel (recovery and analysis of historical data within a set period), fast cloning of data, and automatic query optimization, amongst many others. It is an exciting and innovative alternative to more traditional or well-known data warehouse software and an innovative step into the future for data storage, data queries, and general data science.