[PYTHON] Library for "I want to do that" of data science on Jupyter Notebook

This article

I have python and various libraries for data science.

――I want to plot! ――I want to do the certification process! ――I want to process the data frame!

I will introduce what kind of library is available for basic things such as.

** Request: Please increase the number of items in the edit request or let us know your recommendations. ** **

Data processing

pandas

Holds data in a "data frame" that looks like a relational model (famous for SQL) It provides functions such as filtering, mapping, and grouping for this. It also has a wealth of interfaces for reading and writing data.

The following is a sample that reads csv and leaves only the items whose 'sales' item is 1000 or more.

import pandas as pd
data = pd.read_csv("data.csv")
over_1000 = data[ data['Earnings'] > 1000 ]

Linear algebraic processing

numpy

import numpy as np
#Matrix generation from list
mat = np.matrix([[1, 2], [3, 4]])
#Vector generation from list
vec = np.array([5, 6])
#Take a matrix product
mat.dot(vec)

Random number generation

numpy

numpy provides a wide range of basic processing, including processing linear algebra. This includes random number generation according to the distribution.

For example, a sequence of random numbers that follows a normal distribution can be generated as follows:

import numpy as np

mu, sigma = 2, 0.5
v = np.random.normal(mu,sigma,10000)

plot

A library that can be used to draw graphs

matplotlib

It provides the ability to draw various graphs. Since it is a relatively low layer library, it will be used in combination with seaborn etc.

import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-3, 3, 0.1)
y = np.sin(x)
plt.plot(x, y)

seaborn

Searborn is a library that wraps matplotlib and provides an easier way to draw clean graphs. It provides the ability to draw heatmaps, for example.

import numpy as np
import matplotlib
#When you import seaborn, the graph of matplotlib becomes a beautiful seaborn style graph
import seaborn as sns

x = np.random.normal(size=100)
sns.distplot(x);

Statistical test

scipy

scipy is a library that provides the processing required for scientific and technological calculations. This library actually offers a fairly wide range of features, so you may find most of what you want to do here.

The t-test can be performed as follows.

import numpy as np
from scipy import stats

a = np.random.normal(0, 1, size=100)
b = np.random.normal(1, 1, size=10)
stats.ttest_ind(a, b)

Symbol differentiation

sympy

A library that automatically performs algebraic calculations. In other words, it is a library that can throw all kinds of expression transformations. (By the way, if anyone knows: Is this a term rewriting system?)

Here, we will mention symbolic differentiation as an application.

import sympy as sym

#Prepare variables
x = sym.symbols("x")
#Make a polynomial ...
f = x**3 + 2*x**2 - x + 5
#Differentiate
df_dx = sym.diff(f, x)

Creating a statistical model

statsmodels

A convenient library for creating statistical models.

The following is an example of generating a generalized linear model and looking at its basic statistics (AIC etc. will appear)

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_csv("data.csv")

formula = 'Sales ~ AccessCount + MailSendedCount'
mod = smf.ols(formula=formula, data=df)
res = mod.fit()
res.summary()

scikit-learn

Note

(The content will be increased sequentially.)

Recommended Posts

Library for "I want to do that" of data science on Jupyter Notebook
I want to blog with Jupyter Notebook
I want to display an image on Jupyter Notebook using OpenCV (mac)
I want to get League of Legends data ③
I want to get League of Legends data ②
I want to do pyenv + pipenv on Windows
I want to get League of Legends data ①
I want to create a web application that uses League of Legends data ①
[For data science] Oreore Jupyter cheat sheet [Jupyter Notebook / Lab]
I want to say that there is data preprocessing ~
I want to do Wake On LAN fully automatically
I want to plot the location information of GTFS Realtime on Jupyter! (With balloon)
Recommendation of Jupyter Notebook, a coding environment for data scientists
I want to announce my graduation thesis on IPython Notebook
I want to use a virtual environment with jupyter notebook!
I want to do ○○ with Pandas
Day 65 I installed matplotlib to draw graphs on my Jupyter notebook.
I tried to visualize BigQuery data using Jupyter Lab on GCP
When generating a large number of graphs with matplotlib, I do not want to display the graph on the screen (jupyter environment)
[Note] I want to completely preprocess the data of the Titanic issue-Age version-
[For beginners] I want to get the index of an element that satisfies a certain conditional expression
Allow Jupyter Notebook to embed audio data in HTML tables for playback
I want to use Linux on mac
I analyzed Airbnb data for those who want to stay in Amsterdam
GPU check of PC on jupyter notebook
Jupyter Notebook Basics of how to use
How to hide warnings that do not affect execution in Jupyter Notebook
For the time being using FastAPI, I want to display how to use API like that on swagger
I want to develop Android apps on Android
I made a library for actuarial science
Books on data science to read in 2020
For those who want to use Jupyter Notebook as soon as 1 second because they do not know the password
I want to visualize the transfer status of the 2020 J League, what should I do?
The story of IPv6 address that I want to keep at a minimum
I want to use Python in the environment of pyenv + pipenv on Windows 10
I tried to rescue the data of the laptop by booting it on Ubuntu
I want to do Dunnett's test in Python
[Jupyter Notebook / Lab] 3 ways to debug on Jupyter [Pdb]
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Anyway, I want to check JSON data easily
I want to knock 100 data sciences with Colaboratory
[Pythonocc] I tried using CAD on jupyter notebook
I want to log file I / O on Linux
I want to customize the appearance of zabbix
Do you want me to fix that copy?
What to do if your Jupyter Notebook for beginners asks for a password or token
I want to get custom data attributes of html as elements using Python Selenium
The story of Linux that I want to teach myself half a year ago
I want to take a screenshot of the site on Docker using any font
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I tried to make it easy to change the setting of authenticated Proxy on Jupyter
I want to detect images of cats from Instagram
I want to give a group_id to a pandas data frame
I want to grep the execution result of strace
Simple statistics that can be used to analyze the effect of measures on EC sites and codes that can be used in jupyter notebook
I want to develop an Android application on Android (debugging)
Mac application for double-clicking to open Jupyter Notebook (* .ipynb)
I want to fully understand the basics of Bokeh
I want to install a package of Php Redis
I want to use OpenJDK 11 on Ubuntu Linux 18.04 LTS / 18.10
[ML Ops] I want to do multi-project with Python