ZEPL/zepl-documentation

Update EMR support document section

Closed this issue · 0 comments

This section needs to be updated:

https://docs.zepl.com/guide/emr_integration/#launch-the-amazon-emr-cluster

with the following:

Integration with Amazon EMR

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.

  • Important: This article applies to the ZEPL Enterprise Plan only. Please <a href: "mailto: sales@zepl.com">contact us for more information.

There are two options when connecting ZEPL to AWS EMR clusters:
Note: In both cases, the EMR cluster will have to reside on the same VPC as ZEPL.

1. Existing AWS EMR clusters

ZEPL can connect to existing EMR clusters that your team has created through the AWS console. There are two requirements:
a. The EMR cluster and the ZEPL deployment must be on the same VPC.
b. The EMR cluster must have a public resolvable domain name
To connect a ZEPL notebook to an existing EMR cluster:
i. Go to the "Resources" page on ZEPL and click on "Clusters" menu
ii. On the "Clusters" page, click on "Create new Cluster"
iii. Select the "Connect to an externally managed EMR cluster" and click "Next"
screen shot 2019-01-14 at 1 18 36 pm
iv. Give the cluster a name and add the Master public DNS of the EMR cluster in the respective fields
screen shot 2019-01-14 at 1 19 22 pm
Note: Currently, ZEPL only supports EMR Release 5.14.0 version (more will be added in the future)

That's it. Once it's connected, you can go to any notebook, and on the "Notebook Settings", select the cluster you just created.
screen shot 2019-01-14 at 1 21 43 pm

2. Create a new EMR cluster

ZEPL also enables you to create a new EMR cluster through the ZEPL interface.
Note: It is assumed that in the process of the ZEPL deployment, the ZEPL user IAM role has the credentials to create EMR clusters.
i. As the above, go to the "Resources" page on ZEPL and click on "Clusters" menu
ii. On the "Clusters" page, click on "Create new Cluster"
iii. Select the "Launch new ZEPL managed EMR cluster" and click "Next"
iv. Give the cluster a name, give it an idle timeout (timeout where the cluster is shutdown), give it any additional configurations, select the Hardware configuration, and click "Create"
screen shot 2019-01-14 at 1 27 23 pm

That's it. Again, once created, go to any notebook and select the cluster you just created.
Note: The speed at which the new EMR cluster is created by AWS is dependent on AWS. This often take about 5 minutes.

For all your cluster, you can manage them from the "Clusters" console.
screen shot 2019-01-14 at 1 22 43 pm

From here you can disconnect, shutdown, clone, and control access to these clusters to your organization members.