[PYTHON] "Type Error: Unrecognized value type: <class'str'>" in to_datetime of pandas

This error occurs when a character string that cannot be recognized as a date is included. I'm addicted to it so I've summarized it in the article.

When creating a DataFrame, if there is a "non-date value" between the date strings, it will be read as an object. (If all of them are composed of character strings that can be interpreted as dates, they can be read as datetime.)

For example, a hyphen (-) is inserted between dates as a common case in actual data.

temp = pd.DataFrame(["2020-04-09", "2020-04-10", "-", "2020-04-12"], columns=["date"])
date
0 2020-04-09
1 2020-04-10
2 -
3 2020-04-12

At this time, if you try to convert to datetime type, an error will occur.

pd.to_datetime(temp.date)

#Error output
TypeError: Unrecognized value type: <class 'str'>

The countermeasures are as follows.

Force pd.to_datetime () to run

Ignore the conversion error and execute. At that time, the part of conversion NG will be "NaT".

pd.to_datetime(temp.date, errors="coerce")

#Conversion result
0   2020-04-09
1   2020-04-10
2          NaT
3   2020-04-12
Name: date, dtype: datetime64[ns]

I think it's okay if you understand in advance that hyphens are included and ignored like this time. However, I'm not sure, but I got an error, so it may be better to avoid using it.

Correct the data before executing pd.to_datetime ()

Correctly replace or remove unnecessary character strings in advance. Alternatively, take action at the data file or DB stage.

For example, replace the hyphen with an empty string and then execute. (In the case of an empty string, pd.to_datetime () can complete the execution without skipping an exception.) Similar to the above forced execution, the location of conversion NG is "NaT".

temp.date = temp.date.replace({"-":""})
pd.to_datetime(temp.date)

#Conversion result
0   2020-04-09
1   2020-04-10
2          NaT
3   2020-04-12
Name: date, dtype: datetime64[ns]

An easy way to identify fraudulent parts

In the above example, you can immediately see that hyphens are mixed in due to the small amount of data, but it becomes difficult to grasp when the amount of data is large. For example, you can easily check for illegal characters with the code below. An exception is thrown at the first incorrect location. To check everything, it is an image of repeating the check while correcting the caught part each time. (Please note that if there are too many variations of fraud, it can be a tremendous task.)

def check(x):
    print(x)
    pd.to_datetime(x)
    
temp.date.map(check)

that's all.

Recommended Posts

"Type Error: Unrecognized value type: <class'str'>" in to_datetime of pandas
Features of pd.NA in pandas 1.0.0 (rc0)
Judgment of NaN in pandas: When str type and float type are mixed
Summary of methods often used in pandas
The story of an error in PyOCR
Correspondence to ‘cannot encode object: num, of type: <class’ numpy.int64'> `in Pymongo error
Summary of what was used in 100 Pandas knocks (# 1 ~ # 32)
Find the divisor of the value entered in python
pandas total number of employees missing value complement
Header shifts in read_csv () and read_table () of Pandas
Search by the value of the instance in the list
Date of Address already in use error in Flask
Get the value of a specific key in a list from the dictionary type in the list with Python