Jupyter notebooks on EMR

Explanatory data analysis requires interactive code execution. In case of spark and emr it is very convenient to run the code from jupyter notebooks on a remote cluster. EMR allows installing jupyter on the spark master. In order to do that configure "Applications" field for the emr cluster to contain also jupyter hub. For example: "Applications": [ { "Name": "Ganglia", "Version": "3.7.2" }, { "Name": "Spark", "Version": "2.4.0" }, { "Name": "Zeppelin", "Version": "0....

February 4, 2019 · SergeM