navigation

DataLake on AWS

Overview

Data Lake is a technical term related to Big Data (Big Data). Data Lake is simply a place to store raw data (unprocessed) waiting to be processed, analyzed and given insights.

Data Lake has the following properties:

  • Collect everything – contains all data raw or processed over a long period of time.
  • Multi-user – allows multiple users to refine, explore, and enrich data. Flexible access – supports multiple access patterns on shared infrastructure: batch, interactive, online, search, in-memory, and processing engines other.

Data Lake

We will use AWS Glue to make a data catalog. Amazon Athena is used to query the data in the data lake and Amazon QuickSight to represent the data.

You are a member of the Data Analytics team working for a music start-up company. You will perform discovery, analysis and statistics from the data

Content

  1. Introduction
  2. Preparation steps
  3. Data collection and storage
  4. Create Data Catalog
  5. Transform data
  6. Data Analysis Using Athena
  7. Create models and tables with Quick Sight
  8. Resource Cleanup