Data analysis environment construction with Python (IPython notebook + Pandas)

Introduction

This time, we will build an environment for data analysis in Python on a virtual environment. Specifically, the following are used.

name Description
VirtualBox Virtual machine execution environment
Vagrant Tools for managing virtual machines from the console
IPython (+notebook) Python development&Execution environment
Pandas Library for analysis

What is VirtualBox

VirtualBox is software for virtualizing x86 virtualization (ordinary PCs / servers lying around). The official name is Oracle VM VirtualBox. Currently being developed by Oracle.

A very useful tool for experimenting with various things without affecting the existing environment.

What is Vagrant

Vagrant is a tool that makes it easier to manage virtual environments from the console. You can also easily build a test environment by using Box created by volunteers.

Introducing this often saves time and effort when building various environments.

What is IPython

IPython is a significant extension of the existing Python interactive interpreter. Completion function at the time of input, parallel processing in cluster environment, command line shell function, Extensions such as toolkits around the GUI have been made.

Very useful as an interactive interpreter for ad hoc analysis.

What is IPython notebook

IPython notebook is IPython made available from a web browser. Convenient for GUI-related parts, especially graph plots.

It is possible to complete it with a single machine, but if you install it on a server with good specifications, You will be able to easily analyze from weak clients and share the results with everyone.

What is Pandas

Pandas is a Python data analysis library. A data structure that makes it easy to operate numerical values and matrices, and a summary of the operations.

Behind the scenes, I'm using a numerical library for Python, such as numpy and scipy. Thanks to that, the speed of numerical calculation is fast.

Introduction of various environments and tools

Work environment

This time, we went in the following environment.

Debian 7.6.0 (64bit) was selected as the OS for the virtual environment.

Introducing VirtualBox

Download and install the file that suits your environment from this page. It is compatible with all major operating systems such as Windows, Mac, and Linux. If you follow the installer's instructions, there should be no problem.

Introducing Vagrant

Download and install the file that suits your environment from this page. It is compatible with Windows, Mac, Linux (RedHat, Debian series) OS.

Building a virtual environment

Select the Box file from this page. This time I chose Debian 7.6.0 (64bit).

https://github.com/jose-lpa/packer-debian_7.6.0/releases/download/1.0/packer_virtualbox-iso_virtualbox.box

Execute the following command.

$ vagrant box add debian-7.6 https://github.com/jose-lpa/packer-debian_7.6.0/releases/download/1.0/packer_virtualbox-iso_virtualbox.box
$ vagrant list
...
debian-7.6       (virtualbox, 0)
...
$ mkdir -p ~/vagrant/debian7.6 #Create a location where you want to install the virtual environment
$ cd ~/vagrant/debian7.6
$ vagrant init debian-7.6
$ ls
Vagrantfile

Edit the created Vagrantfile as follows.

Vagrantfile


# -*- mode: ruby -*-
# vi: set ft=ruby :

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "debian-7.6"
  config.vm.network "private_network", ip: "192.168.20.10"
  config.vm.provider "virtualbox" do |vb|
    vb.customize ["modifyvm", :id, "--memory", "2048"]
  end
end

The virtual machine's private IP is now 192.168.20.10, Memory allocation can be set to 2GB.

Start the virtual machine with the following command and connect with SSH.

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'debian-7.6'...
...

$ vagrant ssh
Linux packer-virtualbox-iso-1411922062 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Sep 28 16:43:22 2014 from 10.0.2.2
vagrant@packer-virtualbox-iso-1411922062:~$

You can now log in to the virtual environment. You can return to the local environment with $ logout or Ctrl + D. When terminating the virtual machine

$ vagrant halt

You can end with.

Introducing Pandas

This time, we will use the Python 2.7 series of the system.

Since it is a virtual machine, only pip is used for package management, Without any special package management by virtualenv etc. Install it in Python on your system.

Execute the following command to install all the modules required for analysis.

$ sudo apt-get update
$ sudo apt-get upgrade
...
Do you want to continue [Y/n]? Y
...

$ sudo apt-get install -y gcc g++ libpyside-dev python2.7-dev libevent-dev python-all-dev build-essential python-numpy python-scipy python-matplotlib libatlas-dev libatlas3gf-base python-pandas emacs
$ pip install --user --install-option="--prefix=" -U scikit-learn

Introducing IPython + IPython notebook

Install IPython with the following command.

$ sudo pip install "ipython[all]"

Create a configuration file and write the following contents at the beginning of the configuration file.

$ ipython profile create nbserver
$ emacs /home/vagrant/.ipython/profile_nbserver/ipython_notebook_config.py

ipython_notebook_config


# Configuration file for ipython-notebook.                                                                                                                       

c = get_config()

c.IPKernelApp.pylab = 'inline'
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 9999

...

Start as a server with the following command.

$ ipython notebook --profile=nbserver &

When you access http://192.168.20.10:9999/, you will see the following screen.

Kobito.vdvZ5F.png

Now select New-> Python2 in the upper right to bring up the interactive interpreter.

Kobito.KlbmW9.png

This time it is a virtual environment, so there is no problem, but when using it in a real environment, refer to the following page and You should set a password.

Start IPython notebook server-Set password to restrict access

Execution sample

sample.py


%matplotlib inline
import matplotlib.pyplot as plt

plt.plot(range(100))

Enter the above code and click to execute.

Kobito.JNPLyq.png

in conclusion

Now you have a Python analysis environment.

References

Recommended Posts

Data analysis environment construction with Python (IPython notebook + Pandas)
Data analysis with python 2
Data analysis with Python
Data analysis using python pandas
Virtual environment construction with Docker + Flask (Python) + Jupyter notebook
Pepper-kun remote control environment construction with Docker + IPython Notebook
Easy Python data analysis environment construction with Windows10 Pro x VS Code x Docker
Get started with Python! ~ ① Environment construction ~
Build IPython Notebook environment with boot2docker
Convenient analysis with Pandas + Jupyter notebook
Python3 environment construction with pyenv-virtualenv (CentOS 7.3)
pytorch @ python3.8 environment construction with pipenv
Data analysis starting with python (data visualization 1)
Data science environment construction with Docker
Data analysis starting with python (data visualization 2)
Python environment construction on Mac (pyenv, virtualenv, anaconda, ipython notebook)
Data pipeline construction with Python and Luigi
Python environment construction
Data analysis python
Environment construction (python)
[Stock price analysis] Learning pandas with fictitious data (001: environment preparation-file reading)
python environment construction
Python --Environment construction
Python environment construction
Data analysis starting with python (data preprocessing-machine learning)
python environment construction
Let's get along with Python # 0 (Environment construction)
Create a USB boot Ubuntu with a Python environment for data analysis
Analytical environment construction with Docker (jupyter notebook + PostgreSQL)
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Read Python csv data with Pandas ⇒ Graph with Matplotlib
Poetry-virtualenv environment construction with python of centos-sclo-rh ~ Notes
[Environment construction] Dependency analysis using CaboCha in Python 2.7
First python ① Environment construction with pythonbrew & Hello World !!
From Python environment construction to virtual environment construction with anaconda
Data analysis using Python 0
Data analysis overview python
homebrew python environment construction
Python development environment construction
Voice analysis with python
Data visualization with pandas
Python environment with docker-compose
Data manipulation with Pandas!
Python data analysis template
python2.7 development environment construction
Shuffle data with pandas
Voice analysis with python
Mac environment construction Python
Virtual environment with Python 3.6
Python environment construction @ Win7
R environment construction with Jupyter (formerly IPython notebook) (on OS X El Capitan 10.11.3)
Data analysis for improving POG 1 ~ Web scraping with Python ~
[Python] OpenCV environment construction with Docker (cv2.imshow () also works)
Reading Note: An Introduction to Data Analysis with Python
Challenge principal component analysis of text data with Python
Web application made with Python3.4 + Django (Part.1 Environment construction)
QGIS3 Python plugin development environment construction with VSCode (macOS)
Process csv data with python (count processing using pandas)
Report environment construction by python (matplotlib, pandas, sphinx) + wkhtmltopdf
Sample data created with python
My python data analysis container