[PYTHON] Summary of processes often performed in Pandas 1 (CSV, Excel file related operations)

It's almost time to use Pandas for business, so I don't know what number to brew, but I would like to summarize the frequently used processes.

When reading from a CSV file

import pandas as pd

#File name: sample.csv
#The contents of the file are described below.
# 1,Sample Taro,30
# 2,Sample Hanako,25
# 3,Sample Jiro,28
df = pd.read_csv('sample.csv', 
                 #If None, there is no header, and if you specify a number, that line is interpreted as a header.
                 header=None, 
                 #Specify the column to be indexed * Here, the first column(1,2,Part 3)
                 index_col=0, 
                 #Define column name
                 names=('name', 'age'),
                 #Define the data type of the column
                 dtype={'name': str, 'age': int}
                )

When reading from an Excel file

import pandas as pd

#File name: sample.xlsx
#* The contents of the file are the same as the above CSV file.
df = pd.read_excel('sample.xlsx', 
                   dtype={'name': str, 'age': int}
                  )

When reading multiple sheets from an Excel file at the same time

import pandas as pd

excel_book = pd.ExcelFile('sample.xlsx')

#You can get a list of sheet names of the loaded sheets in a list.
sheet_name_list = excel_book.sheet_names

#If you specify a sheet name, you can get the contents of that sheet.
df = excel_book.parse('sheet1')

When outputting the contents of the data frame to a CSV file

import pandas as pd

#The first argument is the output file name
df.to_csv('sample.csv', 
          #Set False if header is not needed
          header=False, 
          #Set False if no index is needed
          index=False, 
          #Set when limiting the output columns
          columns=['name']
         )

When outputting the contents of the data frame to an Excel file

import pandas as pd

#The first argument is the output file name
df.to_excel('sample.xlsx', 
          #Set False if header is not needed
          header=False, 
          #Set False if no index is needed
          index=False, 
          #Set when limiting the output columns
          columns=['name']
         )

Recommended Posts

Summary of processes often performed in Pandas 1 (CSV, Excel file related operations)
Summary of methods often used in pandas
Summary of operations often performed with asyncpg
Summary of Excel operations using OpenPyXL in Python
A collection of Excel operations often used in Python
Summary of python file operations
Summary of various operations in Tensorflow
Grammar summary often used in pandas
Summary of what was used in 100 Pandas knocks (# 1 ~ # 32)
A personal memo of Pandas related operations that can be used in practice
[Python] Summary of S3 file operations with boto3
Speed evaluation of CSV file output in Python
File operations in Python
Read CSV file: pandas
Summary of string operations
[Python] Creating a GUI tool that automatically processes CSV of temperature rise data in Excel
Processing summary 2 often done in Pandas (data reference, editing operation)
[Python] Open the csv file in the folder specified by pandas
Summary of Python3 list operations
Basic usage of Pandas Summary
How to paste a CSV file into an Excel file using Pandas
Points to note when making pandas read csv of excel output
Format the CSV file of "National Holiday" of the Cabinet Office with pandas