Introduction

When I tried to import the file acquired in csv format with pandas and process it, the header and the data were misaligned, but unexpectedly I could not reach the answer immediately, so I wrote it as an article. [Jump to solution](# solution)

Operating environment

I ran it in the following environment.

module	version
python	3.8.3
pandas	1.0.5

problem

Import the following csv format file as DatFrame.

`example.csv`


Time  x   y   z
   0  1   2  10
   1  2   2  10
   2  3   2  10
..

Import using read_csv ().

`read_csv.py`


import pandas as pd
path = 'csv file path'
df = pd.read_csv(path)
print(df)

The output result in the terminal is as follows.

#    Time\tx\ty\tz
#  0    1\t1\t2\t10
#  1    2\t2\t2\t10
#  2    3\t3\t2\t10
..

There is an extra \ t in it. It seems that it was tab-separated (tsv format) instead of comma-separated.

Try running it with read_tabel ().

`read_tsv.py`


import pandas as pd
path = 'csv file path'
df = pd.read_table(path)
print(df)

The output result is as follows. The \ t is gone, but instead the header and data are misaligned and all the z data is now NaN.

#    Time  x   y    z
#  0    1  2  10  NaN
#  1    2  2  10  NaN
#  2    3  2  10  NaN

Solution

Give an argument to read_csv () as follows.

`read_csv_2.py`


import pandas as pd
path = 'csv file path'
df = pd.read_csv(path, sep='\s+')
print(df)

According to the padas documentation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), the arguments you give to files separated by one or more characters. It seems like. It seems that the cause was that the original data was separated by tabs and spaces ... Please forgive me ... lol.

Summary

I was able to correctly convert the header and data to a DataFrame by giving the argument sep ='\ s +' tocsv_read ()for the data separated by tabs and spaces.

[PYTHON] Header shifts in read_csv () and read_table () of Pandas

Introduction

Operating environment

problem

example.csv

read_csv.py

read_tsv.py