Creating Glue Crawler

Create Glue Crawler

  1. Go to AWS Management Console
  • Find AWS Glue
  • Select AWS Glue

Create Glue Crawler

  1. In the AWS Glue interface
  • Select Crawlers

Create Glue Crawler

  1. Select Create Crawler

Create Glue Crawler

  1. In the Add crawler interface
    • Crawler name, enter summitcrawler
    • Select Next

Create Glue Crawler

  1. For Add data source
  • Select S3

Create Glue Crawler

  1. Select S3 path via Browse. You choose the path.
  • Select Crawl new sub-folders only
  • Select Add an S3 data source

Create Glue Crawler

  1. After adding data source, select Next.

Create Glue Crawler

  1. For IAM role
  • You can create a new role yourself by selecting Create new IAM role
  • Or choose the prepared role.
  • Then select Next

Create Glue Crawler

  1. For Target database. You execute Add database

Create Glue Crawler

  1. Create a database by:
  • Enter database name: summitdb
  • Select Create database

Create Glue Crawler

  1. After creating the database, select the database and select Next

Create Glue Crawler

  1. Check the configuration again and select Create crawler

Create Glue Crawler

  1. Create Crawler successfully. Then you choose Run crawler

Create Glue Crawler

  1. It takes about 1 minute to initialize the Crawler run.

Create Glue Crawler

  1. Initialization of Run crawler is successful.

Create Glue Crawler

  1. After initialization for some time Crawler goes to Stopping state

Create Glue Crawler

  1. When you see the crawler status at Ready

Create Glue Crawler

  1. Select Table in AWS Glue interface, we will see there are 2 data tables.

Create Glue Crawler

  1. You choose the data table raw

Create Glue Crawler

  1. Explore data table details.

Create Glue Crawler