[PYTHON] Check if the expected column exists in Pandas DataFrame

For example, suppose that there is the following "function that processes DataFrame".

import pandas as pd

def preprocess(df: pd.DataFrame) -> pd.DataFrame:
    df["full_name"] = df["first_name"] + " " + df["last_name"]
    return df

The DataFrame argument for this function is expected to contain the columns first_name and last_name, but you may want to check this at the beginning of the function.

This can be easily written using the set type [^ set] operation.

import pandas as pd

def preprocess(df: pd.DataFrame) -> pd.DataFrame:
    required_columns = {"first_name", "last_name"}
    if not required_columns <= set(df.columns):
        raise ValueError(f"missing columns: {required_columns - set(df.columns)}")
    df["full_name"] = df["first_name"] + " " + df["last_name"]
    return df

If you write it like this, it will throw a ValueError if the required column is missing.

df = pd.DataFrame([{"first_name": "John", "age": 30}])  # 'last_name'DataFrame with missing columns
preprocess(df)  #=> ValueError: missing columns: {'last_name'}

Recommended Posts

Check if the expected column exists in Pandas DataFrame
Check if the URL exists in Python
[Pandas] If the first row data is in the header in DataFrame
Is there NaN in the pandas DataFrame?
Check if the characters are similar in Python
In bash, "Delete the file if it exists".
Check if the string is a number in python
Check if it is Unix in the scripting language
Check if it is Unix in the scripting language
Put the lists together in pandas to make a DataFrame
A handy function to add a column anywhere in a Pandas DataFrame
How to check if a value exists in an enum
python / pandas / dataframe / How to get the simplest row / column / index / column
Check if the configuration file is read in an easy-to-understand manner
Check if the password hash generated by PHP matches in Python
Check the data summary in CASTable
I want to make the second line the column name in pandas
Browse .loc and .iloc at the same time in pandas DataFrame
[Golang] Check if a specific character string is included in the character string
How to get a specific column name and index name in pandas DataFrame
If you get a no attribute error in boto3, check the version
Check the behavior of destructor in Python
Update Pandas DataFrame elements by column name
Get the top nth values in Pandas
How to reassign index in pandas dataframe
[Pandas] Expand the character string to DataFrame
Replace column names / values with pandas dataframe
12. Save the first column in col1.txt and the second column in col2.txt
Delete rows with arbitrary values in pandas DataFrame
Determining if there are birds in the image
[Python] Sort the table by sort_values (pandas DataFrame)
Remove rows with duplicate indexes in pandas DataFrame
Create Python folder Check if it already exists
Save Pandas DataFrame as .csv.gz in Amazon S3
How to check in Python if one of the elements of a list is in another list