Run Apache-Spark with IPython Notebook

background

I'm interested in big data, real-time analysis, data mining, machine learning, and so on, because everyone is blogging and talking in a fun way. It sounds interesting. So, it's just that I'm personally interested and researching. I'm less than a bad guy, so I'm enjoying what I'm doing at the next level.

things to do

Just set up Apache-Spark to lend in IPython Notebook. If you google it, various things will come out, but I want to keep it close to myself, so make a note. As I learned earlier, spark 1.2.0 was released, so it's already slightly old. But I think it's the same anyway.

Premise

environment

procedure

  1. Download Spark and copy or install it somewhere

Screenshot 2014-12-20 23.26.53.png

When installed with Homebrew, it will be placed in / usr / local / Cellar / apache-spark / 1.1.1.

  1. Set the environment variable to SPARK_HOME

    export SPARK_HOME="Folder where spark was unzipped"
    
  2. Create an IPython profile

    $ ipython profile create pyspark
    
  3. Edit the IPython environment profile startup / 00-pyspark-setup.py

    #coding:utf-8
    import os
    import sys
    
    os.environ['SPARK_HOME'] = '/usr/local/Cellar/apache-spark/1.1.1'
    spark_home = os.environ.get('SPARK_HOME', None)
    if not spark_home:
        raise ValueError('SPARK_HOME environment variable is not set')
    sys.path.insert(0, os.path.join(spark_home, 'libexec/python'))
    sys.path.insert(0, os.path.join(spark_home, 'libexec/python/lib/py4j-0.8.2.1-src.zip'))
    execfile(os.path.join(spark_home, 'libexec/python/pyspark/shell.py'))
    

In my environment the config file is in ~ / .ipython / profile_pyspark. py4j-0.8.2.1-src.zip is different depending on the version, so let's rewrite it. In Windows, I think it was around the user folder.

  1. Try to start

    $ ipython notebook --profile=pyspark
    
  2. It feels like something is moving. No!

Cree.png

reference

http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/

Recommended Posts

Run Apache-Spark with IPython Notebook
Parallel computing with iPython notebook
Play with Jupyter Notebook (IPython Notebook)
Run IPython Notebook on Docker
Graph drawing with IPython Notebook
Use Bokeh with IPython Notebook
R & D life with iPython notebook
Build IPython Notebook environment with boot2docker
Use apache Spark with jupyter notebook (IPython notebook)
"LIVE" HTML presentation with IPython 3.0.0-dev, IPython Notebook
ipython notebook installation
IPython Notebook Recommendations
Rich cell output with Jupyter Notebook (IPython)
How to debug with Jupyter or iPython Notebook
Graph drawing with jupyter (ipython notebook) + matplotlib + vagrant
Create a table of contents with IPython notebook
Run Python with VBA
Run prepDE.py with python3
Run Blender with python
Start IPython with virtualenv
Make slides with iPython
Remotely connect IPython notebook
Run iperf with python
Data analysis environment construction with Python (IPython notebook + Pandas)
EC2 provisioning with Vagrant + Jupyter (IPython Notebook) on Docker
[Machine learning] Start Spark with iPython Notebook and try MLlib
I want to use R functions easily with ipython notebook
[IPython] How to Share IPython Notebook
When using optparse with iPython
Run python with PyCharm (Windows)
Using Graphviz with Jupyter Notebook
Touch Flask + run with Heroku
Run Python with CloudFlash (arm926ej-s)
Use pip with Jupyter Notebook
Displaying strings on IPython Notebook
Use Cython with Jupyter Notebook
To run gym_torcs with ubutnu16
Run Label with tkinter [Python]
Launch the IPython notebook server
Run Jupyter Notebook on windows
Run DHT22 with RasPi + Python
reload in django shell with ipython
Allow external connections with jupyter notebook
Formatting with autopep8 on Jupyter notebook
Visualize decision trees with jupyter notebook
Make a sound with Jupyter notebook
Script execution at startup with ipython
Run Rotrics DexArm with python API
Run the app with Flask + Heroku
Run mruby with Python or Blender
Run SwitchBot on Windows 10 with Bleak
Run scripts with Django's admin command
Run azure ML on jupyter notebook
Run Aprili from Python with Orange
Run python3 Django1.9 with mod_wsgi (deploy)
Using Japanese with Rodeo's IPython @ Windows
Add more kernels with Jupyter Notebook
Convenient analysis with Pandas + Jupyter notebook
Batch processing notes in IPython Notebook
Until you run python with apache