[PYTHON] Data handling 2 Analysis of various data formats

Aidemy 2020/10/11


Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the second post on data handling. Nice to meet you.

What to learn this time ・ Introduction of formats that can be converted with pandas -Data format conversion using pandas ・ Graph CSV file

Data format analysis

File input / output using pandas

-HTML, JSON, CSV, and Excel have different uses such as Web pages, WebAPI, and data organization. You can convert between these data formats using __pandas. __

HTML scraping with pandas

-Basically, HTML tag elements such as \

and \

are scraped with BeautifulSoup, but __table elements \

__ are scraped with pandas.

About JSON

-JSON is an abbreviation of "JavaScript Object Notation" and supports the exchange of data in different programming languages. -The structure of the JSON file is basically the same as the structure of Python dictionary variables, and is expressed in the form of {key: value,}.

About CSV files

-CSV is "Comma Separated Values", that is, "comma-separated values". Due to its lightweight and simple data structure, it has been used for data exchange since ancient times. -The CSV file has a structure that is only separated by value, such as "a, b, c,".

About Excel

・ It goes without saying that Excel is spreadsheet software. Since it is widely used, the range of data analysis will expand when Excel scraping becomes possible. -For each name of Excel, first, the file is called __ "book" __, the table in the file is __ "sheet" __, of which the vertical is __ "column" __ The side is __ "row" __, and each item is called __ "cell" __.

Data format conversion

Read the file with DataFrame

-Actually convert the above-mentioned data format. First, reading the file _pd.read Data type ("file name") __. For example, HTML is "pd.read_html ()", and Excel is "pd.read_excel ()".

-Write the file with _pd.to data type ("file name") __. Also, here it is "pd", but if you want to write the DataFrame type object "df" to an HTML file, it will be "df.to_html ()".

Graph the data in the CSV file

Graphing procedure

-"Read CSV file (read_csv)" "Create graph with pandas" "Draw graph with matplotlib (plt.show)" ・ Of these, "Create graphs with pandas" is new. The method is OK with __ "df.plot ()" __.




-Pandas allows you to exchange data between various data formats. -When reading or writing other data formats to python, it is expressed as __ "pd.read_csv ()" "df.to_html ()" __. -The read CSV file can be graphed like __df.plot () __.

This time is over. Thank you for reading until the end.

Recommended Posts

Data handling 2 Analysis of various data formats
Python Application: Data Handling Part 2: Parsing Various Data Formats
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
Data analysis Titanic 2
Data analysis python
Data analysis Titanic 1
Multi-condition data handling
Data analysis Titanic 3
Sentiment analysis of large-scale tweet data by NLTK
A well-prepared record of data analysis in Python
[Data science memorandum] Handling of missing values ​​[python]
Numerical summary of data
Data analysis with python 2
Data analysis using xarray
Analysis of financial data by pandas and its visualization (2)
Data analysis parts collection
Analysis of financial data by pandas and its visualization (1)
Data analysis using Python 0
Data analysis overview python
Challenge principal component analysis of text data with Python
Analysis of measurement data ①-Memorandum of understanding for scipy fitting-
Story of image analysis of PDF file and data extraction
List of Python code used in big data analysis
Python data analysis template
Planar skeleton analysis with Python (4) Handling of forced displacement
Preprocessing of prefecture data
Basics of regression analysis
Selection of measurement data
Data analysis with Python
Various processing of Python
Let's make the analysis of the Titanic sinking data like that
Notes on handling large amounts of data with python + pandas
First step of data analysis (number of data, table display, missing values)
Comparison of data frame handling in Python (pandas), R, Pig
My python data analysis container
Multidimensional data analysis library xarray
Handling of absolute paths of os.path.join
Python for Data Analysis Chapter 4
Data handling 3 (development) About data format
Tuning experiment of Tensorflow data
Various import methods of Mnist
Visualization of data by prefecture
About various encodings of Python 3
[Python] Notes on data analysis
Handling of python on mac
Implementation of independent component analysis
Python data analysis learning notes
Fourier transform of raw data
Average estimation of capped data
Wrap analysis part1 (data preparation)
About data management of anvil-app-server
Data analysis using python pandas
Python for Data Analysis Chapter 3
Probability prediction of imbalanced data
Analyzing Twitter Data | Trend Analysis
I studied four libraries of Python 3 engineer certified data analysis exams
A simple data analysis of Bitcoin provided by CoinMetrics in Python
[SIR model analysis] Peak out of infections in various parts of Japan ♬
Practical exercise of data analysis with Python ~ 2016 New Coder Survey Edition ~
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)