AWS Lake Formation Workshop

Know How Guide and Hands on Guide for AWS

AWS Lake Formation Workshop

Introduction

Lake Formation uses the following services:

Lake Formation calls AWS API operations to perform the following tasks:

Step 1 Glue Basics

  1. Glue Data Catalog
  2. Glue ETL

Step 2 Lake Formation Basics

  1. Set Data Lake Lake Formation Administrator
  2. Change Default Catalog Settings: enable fine-grained access control with Lake Formation permissions
  3. Databases
  4. Register an Amazon S3 bucket as your data lake storage
  5. Blueprints: Database blueprints and Log file blueprints
  6. Granting Permissions for different personas
  7. Different personas querying the Data Lake with Athena to verify the Permissions

Step 3 Integration with Amazon EMR

Beginning with Amazon EMR 5.31.0, you can launch a cluster that integrates with AWS Lake Formation. Integrating Amazon EMR with AWS Lake Formation provides the following key benefits:

The integration between Amazon EMR and AWS Lake Formation supports the following applications:

Integrate Amazon EMR and Lake Formation prerequisite:

  1. Configure Trust Relationship between your organization’s Identity Provider (IdP) and AWS
    • Auth0
    • Okta
    • ADFS
  2. Create Amazon EMR Cluster integrating with AWS Lake Formation
  3. Grant data access and update the SAML Identity Provider Application Callback URL with EMR cluster Master Node DNS
    • Allow Amazon EMR clusters to filter data managed by Lake Formation
    • Grant Permissions
  4. Verify Access in Apace Spark via Apache Zeppelin Notebook or Amazon EMR Notebooks Before you access notebook url, follow up the guide Set Up an SSH Tunnel to the Master Node Using Dynamic Port Forwarding

Step 4 Handling Real-Time Data

  1. [Create a Data Stream]
  2. [Sample Stream creation by using Kinesis Data Generator]
  3. Create Stream Table to show real-time data that comes from the Firehose delivery stream
  4. Querying Real Time Data under permission control of Lake Formation

Step 5 Incremental Blueprints

  1. Create Incremental Blueprints
  2. Insert new data in MySQL
    • Connect to EC2-DB-Loader EC2
        mysql -h tpc-database-dns -u tpcadmin -p tpc
        INSERT INTO tpc.customer (c_salutation,c_customer_sk,c_first_name,c_last_name) VALUES("Dr.",29999935,"Jill","Thomas");
        INSERT INTO tpc.customer (c_salutation,c_customer_sk,c_first_name,c_last_name) VALUES("Dr.",29999936,"Jill","Thomas");        
      
  3. Query the Incremental data from Athena

Step 6 Glue to Lake Formation Migration

How to migrate glue permissions to lake formation permissions

  1. Lab Preparation
  2. Using Glue Permissions to control the data access
    • Glue use the IAM Policy and S3 permission policy to provide table level access control
  3. Migrate Permissions to Lake Formation
    • Step 1: List Users’ and Roles’ Existing Permissions
      aws iam list-policies-granting-service-access --arn arn:aws:iam::[AccountID]:user/glue-admin --service-namespaces glue
      
    • Step 2: Set Up Equivalent Lake Formation Permissions

    Grant AWS Lake Formation permissions to match the AWS Glue permissions in policy GlueProdPolicy and GlueTestPolicy - Step 3: Give Users IAM Permissions to Use Lake Formation - Step 4: Switch Your Data Stores to the Lake Formation Permissions Model - Step 5: Secure New Data Catalog Resources: clear check box Use only IAM access control for new databases and Use only IAM access control for new tables in new databases - Step 6: Give Users a New IAM Policy for Future Data Lake Access by adding policy GlueFullReadAccess - Step 7: Clean Up Existing IAM Policies by - Remove GlueProdPolicy from glue-admin and GlueTestPolicy from glue-dev-user - Remove Bucket Policies permission for glue-admin and glue-dev-user

Reference

Workshop link