The idea is to create a single data platform that combines the easy and structured querying capability of data warehouses with the flexibility, openness and cost effectiveness of data lakes. The Data Lakehouse concept seeks to address this If for example you didn’t know where all your PII data was stored you risked non compliance with data protection laws such as GDPR. Data governance features typically weren’t prioritised and data was often poorly catalogued.Data lakes didn’t properly support transactions and incremental data updates.BI tool support was often limited so querying the data was more difficult.Data sets weren’t really modelled (by design) which meant they were more difficult to understand and join up.The “store anything” approach offered by data lakes meant data was often badly curated and suffered from poor quality (hence the growth of the term “Data Swamp”).Many people predicted the death of the data warehouse as a result, but that didn’t happen. The theory was you could now just throw your raw data files into a big and cheap data storage platform and you had yourself a Data Lake that your analysts and data scientists could be let loose on with a plethora of new query tools. The Databricks Lakehouse Platform is a unified analytics platform represented by a single group of tools that can build, share, deploy and maintain very large. Open source developments in the early 2000s such as Hadoop and Hive meant it was now a lot easier and cheaper to access and analyse data in its raw form. The Databricks Lakehouse combines the ACID transactions and data governance of enterprise data warehouses with the flexibility and cost-efficiency of data lakes to enable business intelligence (BI) and machine learning (ML) on all data.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |