When I tried to import the file acquired in csv format with pandas and process it, the header and the data were misaligned, but unexpectedly I could not reach the answer immediately, so I wrote it as an article. [Jump to solution](# solution)
I ran it in the following environment.
module | version |
---|---|
python | 3.8.3 |
pandas | 1.0.5 |
Import the following csv format file as DatFrame.
example.csv
Time x y z
0 1 2 10
1 2 2 10
2 3 2 10
..
Import using read_csv ()
.
read_csv.py
import pandas as pd
path = 'csv file path'
df = pd.read_csv(path)
print(df)
The output result in the terminal is as follows.
# Time\tx\ty\tz
# 0 1\t1\t2\t10
# 1 2\t2\t2\t10
# 2 3\t3\t2\t10
..
There is an extra \ t
in it. It seems that it was tab-separated (tsv format) instead of comma-separated.
Try running it with read_tabel ().
read_tsv.py
import pandas as pd
path = 'csv file path'
df = pd.read_table(path)
print(df)
The output result is as follows.
The \ t
is gone, but instead the header and data are misaligned and all the z data is now NaN.
# Time x y z
# 0 1 2 10 NaN
# 1 2 2 10 NaN
# 2 3 2 10 NaN
Give an argument to read_csv ()
as follows.
read_csv_2.py
import pandas as pd
path = 'csv file path'
df = pd.read_csv(path, sep='\s+')
print(df)
According to the padas documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), the arguments you give to files separated by one or more characters. It seems like. It seems that the cause was that the original data was separated by tabs and spaces ... Please forgive me ... lol.
I was able to correctly convert the header and data to a DataFrame by giving the argument sep ='\ s +'
tocsv_read ()
for the data separated by tabs and spaces.
Recommended Posts