Integration with Amazon EMR
Beginning with Amazon EMR 5.31.0, you can launch a cluster that integrates with AWS Lake Formation. Integrating
Amazon EMR with AWS Lake Formation provides the following key benefits:
- Provides fine-grained, column-level access to databases and tables in the AWS Glue Data Catalog.
- Enables federated single sign-on to EMR Notebooks or Apache Zeppelin from your enterprise identity system that is compatible with Security Assertion Markup Language (SAML) 2.0.
To integrate Amazon EMR and Lake Formation, your organization must meet the following requirements:
- Manage your corporate identities using an existing SAML-based Identity Provider, such as Active Directory Federation Services (AD FS). For more information, see Supported Third-Party Providers for SAML.
- Use the AWS Glue Data Catalog as a metadata store.
- Use EMR Notebooks or Apache Zeppelin to access data managed by AWS Glue and Lake Formation.
- Define and manage permissions in Lake Formation to access databases, tables, and columns in AWS Glue Data Catalog. For more information, see AWS Lake Formation.
To learn more about AWS Lake Formation & Amazon EMR Integration, please visit: Integrating Amazon EMR with AWS Lake
The integration between Amazon EMR and AWS Lake Formation supports the following applications:
- Amazon EMR notebooks
- Apache Zeppelin
- Apache Spark through Amazon EMR notebooks
Before You Begin
To launch an Amazon EMR cluster with AWS Lake Formation, you need to complete the following prerequisite:
Configure Trust Relationship Between third-party SAML 2.0 identity provider (IdP) solutions and AWS.
Proceed to the next exercise to configure Trust Relationship.