[PYTHON] Convenient analysis with Pandas + Jupyter notebook

environment

Python 3.5.1 :: Anaconda 2.5.0

Are you using pandas.DataFrame?

Isn't it difficult to read csv with python, read json, execute SQL document from DB and execute (hogehoge)? Especially for DB, it is very troublesome to separate rollback and commit in error handling.

That problem can be solved by using pandas.

Frequent csv loading

before


import csv

with open("data.csv",'r') as f:
	data = csv.reader(f)
	
for row in data:
	print(row)

after


import pandas as pd

data = pd.read_csv("data.csv")
print(data)

Drop selected data in sql to python (postgres)

before


import psycopg2

conn = psycopg2.connect("dbname=test host=localhost user=postgres")
cur = conn.cursor()
cur.execute("SELECT * FROM test_table LIMIT 100;")
data = cur.fetchall()

for row in data:
	print(row)

after


import pandas as pd
import psycopg2

conn = psycopg2.connect("dbname=test host=localhost user=postgres")
data = pd.read_sql("SELECT * FROM test_table LIMIT 100;",conn)
print(data)

The nice thing about pandas is __ Table format data structure can be retained as it is __ There is. In other words, you can pull the DB table structure or csv column as it is.

Example (Sample csv file)

By handling with __Jupyter notebook, it is easier to see and more convenient __

スクリーンショット 2016-07-19 13.18.35.png

__ Easily check the type of each column __

スクリーンショット 2016-07-19 13.22.31.png (The type called object is a character string because there are multiple type data in the column. For example, if you want to convert a column containing numbers and strange characters to a number-only type, If you set df [" column_name"]. convert_objects (convert_numeric = True), what could not be converted will be stored as NaN)

There are many articles on how to pandas, and Jupyter notebook is a very easy tool to use. If you combine them, you can analyze the data very quickly and easily, so please give it a try.

Postscript We will summarize useful methods for data aggregation and analysis with pandas as a memorandum (will be updated at any time) Minimum methods to remember when aggregating data with Pandas

Recommended Posts

Convenient analysis with Pandas + Jupyter notebook
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Using Graphviz with Jupyter Notebook
Use pip with Jupyter Notebook
Play with Jupyter Notebook (IPython Notebook)
Data analysis environment construction with Python (IPython notebook + Pandas)
Allow external connections with jupyter notebook
Formatting with autopep8 on Jupyter notebook
Visualize decision trees with jupyter notebook
Make a sound with Jupyter notebook
Use markdown with jupyter notebook (with shortcut)
Build a comfortable psychological experiment / analysis environment with PsychoPy + Jupyter Notebook
Use nb extensions with Anaconda's Jupyter notebook
Use apache Spark with jupyter notebook (IPython notebook)
I want to blog with Jupyter Notebook
Use Jupyter Lab and Jupyter Notebook with EC2
Try SVM with scikit-learn on Jupyter Notebook
How to use jupyter notebook with ABCI
Linking python and JavaScript with jupyter notebook
[Jupyter Notebook memo] Display kanji with matplotlib
Rich cell output with Jupyter Notebook (IPython)
Jupyter Notebook memo
Introducing Jupyter Notebook
Powerful Jupyter Notebook
Settings when reading S3 files with pandas from Jupyter Notebook on AWS
Jupyter notebook password
Jupyter Notebook memo
Convenient time series aggregation with TimeGrouper in pandas
How to debug with Jupyter or iPython Notebook
Analytical environment construction with Docker (jupyter notebook + PostgreSQL)
Verify NLC accuracy with Watson Studio's Jupyter Notebook
Try using conda virtual environment with Jupyter Notebook
Interactively visualize data with TreasureData, Pandas and Jupyter.
Fill the browser with the width of Jupyter Notebook
Graph drawing with jupyter (ipython notebook) + matplotlib + vagrant
Data analysis with python 2
Virtual environment construction with Docker + Flask (Python) + Jupyter notebook
Multiple selections with Jupyter
Candlestick with plotly + Jupyter
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Quickly visualize with Pandas
Get started Jupyter Notebook
Basket analysis with Spark (1)
Bootstrap sampling with Pandas
Processing datasets with pandas (2)
Merge datasets with pandas
jupyter and pandas installation
Learn Pandas with Cheminformatics
Monitor the training model with TensorBord on Jupyter Notebook
Dependency analysis with CaboCha
Voice analysis with python
Try basic operations for Pandas DataFrame on Jupyter Notebook
Drawing a tree structure with D3.js in Jupyter Notebook
Import specific cells from other notebooks with Jupyter notebook
EC2 provisioning with Vagrant + Jupyter (IPython Notebook) on Docker
Data visualization with pandas
3 Jupyter notebook (Python) tricks
Data manipulation with Pandas!
Shuffle data with pandas
Voice analysis with python
Dynamic analysis with Valgrind