DataLake on AWS
Data Lake is a technical term related to Big Data (Big Data). Data Lake is simply a place to store raw data (unprocessed) waiting to be processed, analyzed and given insights.
Data Lake has the following properties:
- Collect everything – contains all data raw or processed over a long period of time.
- Multi-user – allows multiple users to refine, explore, and enrich data.
Flexible access – supports multiple access patterns on shared infrastructure: batch, interactive, online, search, in-memory, and processing engines other.
We will use AWS Glue to make a data catalog. Amazon Athena is used to query the data in the data lake and Amazon QuickSight to represent the data.
You are a member of the Data Analytics team working for a music start-up company. You will perform discovery, analysis and statistics from the data
- Preparation steps
- Data collection and storage
- Create Data Catalog
- Transform data
- Data Analysis Using Athena
- Create models and tables with Quick Sight
- Resource Cleanup