Jupyter notebooks on EMR

Explanatory data analysis requires interactive code execution. In case of spark and emr it is very convenient to run the code from jupyter notebooks on a remote cluster. EMR allows installing jupyter on the spark master. In order to do that configure "Applications" field for the emr cluster to contain also jupyter hub. For example: "Applications": [ { "Name": "Ganglia", "Version": "3.7.2" }, { "Name": "Spark", "Version": "2.4.0" }, { "Name": "Zeppelin", "Version": "0....

February 4, 2019 · SergeM

Bokeh in jupyter notebooks for interactive plots

Bokeh is a library for interactive visualization. One can use it in Jupyter notebooks. Here is the example. Lets say we have a pandas dataframe with timestamps and some values: 1 2 3 4 5 6 7 8 9 10 import pandas as pd from io import StringIO df = pd.read_csv(StringIO("""timestamp,value 2018-01-01T10:00:00,20 2018-01-01T12:00:00,10 2018-01-01T14:00:00,30 2018-01-02T10:30:00,40 2018-01-02T13:00:00,50 2018-01-02T18:00:40,10 """), parse_dates=["timestamp"]) You can visualize it to a nice graph with zoom, selection, and mouse-over tooltips using the bokeh:...

June 20, 2018 · SergeM