[PYTHON] Prepare a programming language environment for data analysis

I talked about Preparing a computer environment for data analysis before, but even if you only have a computer, you can't talk about it without a programming language environment. ..

Today I'll show you how to install Python and its related libraries.

Python build

The latest version 3.4.1 was released on 5/18.

Reasons to build your own

Python setup methods include virtual environments for languages such as virtualenv, apt and [brew. Many people use a package management system like [http://brew.sh/).

I recommend building your own programming languages that you often use. The main reasons are as follows.

  1. Can be built using the latest source code that has not been released yet
  2. If there is a bug in the language or the behavior you want to change, you can fix it and build it yourself.
  3. Make it easier to participate in language development
  4. You can follow common steps on many platforms without being locked into a specific package management system or virtual environment middleware.

Build installation destination

Also, by standardizing the build and installation destination, it will be easier to understand when uninstalling or switching versions.

The author specifies the directory for most products as follows.

/opt/[Product name]/[version]

It also creates a symbolic link under / opt / [product name] / called current and links it to the version you want to use.

For example:

$ ls -la /opt/python/
drwxr-xr-x 3.3
drwxr-xr-x 3.4
drwxr-xr-x trunk
lrwxrwxrwx current -> 3.4

$ ls -la /opt/ruby/
drwxr-xr-x 1.9.3
drwxr-xr-x 2.0
drwxr-xr-x 2.1
lrwxrwxrwx current -> 2.1

This makes it easy to switch between versions coexisting with older versions, and even uninstalling can be easily removed with rm -rf.

Installation script

Also, builds use shell scripts rather than manual ones.

For example, to install Python, use this script. https://github.com/ynakayama/tagokura-python/blob/master/installer/install_python.sh

At startup, specify the version in the first argument and the installation destination in the second argument, as described in the comments.

~/install_python.sh 3.4.1 /opt/python/3.4

If you can build by just launching a shell script, you don't have to repeat the manual work when installing on a different host or when building a newer version.

Recently, automation frameworks such as chef have become popular, but shell scripts have long been traditional, so that's right. You don't have to worry about it becoming obsolete easily. It can be used even in the smallest environment that has been around for a long time, and it is easy to check the behavior by looking at the contents when something goes wrong.

Install the pip package

It is recommended to install the pip package as well as the main unit with a shell script.

It is a good idea to put all the necessary packages together by referring to this script. https://github.com/ynakayama/tagokura-python/blob/master/installer/install_pip.sh

If you use AWS, you should also include the AWS Command Line Interface.

Summary

It is convenient to standardize and automate the installation work of programming languages used in the computer environment. When using in a distributed environment, specify the version and install using the same procedure so that there is no difference between the versions.

Recommended Posts

Prepare a programming language environment for data analysis
Prepare a high-speed analysis environment by hitting mysql from the data analysis environment
Create a USB boot Ubuntu with a Python environment for data analysis
Set up a development environment for natural language processing
Building a Python environment for programming beginners (Mac OS)
Python for Data Analysis Chapter 4
Python for Data Analysis Chapter 2
Tips for data analysis ・ Notes
Python for Data Analysis Chapter 3
Build a data analysis environment with Kedro + MLflow + Github Actions
Recommendation of Jupyter Notebook, a coding environment for data scientists
Dockerfile for creating a data science environment based on pip3
Build a python data analysis environment on Mac (El Capitan)
<Python> Build a dedicated server for Jupyter Notebook data analysis
Start SQLite in a programming language
Preprocessing template for data analysis (Python)
Data analysis for improving POG 3-Regression analysis-
Prepare the environment for Atom for Pythonista
The first time a programming beginner tried simple data analysis by programming
Use a scripting language for a comfortable C ++ life 5 --Use the Spyder integrated environment to check numerical data-
Programming for humans with a well-defined __repr__
Prepare a Python virtual environment for your project with venv with VS Code
Python visualization tool for data analysis work
Programming environment for beginners made on Windows
Let's create a virtual environment for Python
[Mac] Building a virtual environment for Python
A summary of Python e-books that are useful for free-to-read data analysis
Creating a data analysis application using Streamlit
Try using a stochastic programming language (Pyro)
Prepare the development environment for keyhac for Mac
Building a conda environment for ROS users
Quickly build a python environment for deep learning and data science (Windows)
Data analysis environment centered on Datalab (+ GCP)
Building a Python development environment for AI development
Creating a development environment for machine learning
JupyterLab Basic Setting 2 (pip) for data analysis
JupyterLab Basic Setup for Data Analysis (pip)
Analysis for Data Scientists: Qiita Self-Article Summary 2020
Execute API of Cloud Pak for Data analysis project Job with environment variables
Build a data analysis environment that links GitHub authentication and Django with JupyterHub
Build a C language development environment with a container
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Create execution environment for each language with boot2docker
Analysis for Data Scientists: Qiita Self-Article Summary 2020 (Practice)
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
Use a scripting language for a comfortable C ++ life
I created a Dockerfile for Django's development environment
An introduction to statistical modeling for data analysis
How to use data analysis tools for beginners
Python: Prepare a serializer for the class instance:
Data analysis in Python: A note about line_profiler
I tried using Pythonect, a dataflow programming language.
Commands for creating a python3 environment with virtualenv
Build a Kubernetes environment for development on Ubuntu
Procedure for creating a Python quarantine environment (venv environment)
Prepare sample data for text mining by yourself
A memo for creating a python environment by a beginner
A well-prepared record of data analysis in Python
A story about data analysis by machine learning
Build a mruby development environment for ESP32 (Linux)
Data analysis Titanic 2