[PYTHON] pandas SettingWithCopyWarning

Introduction

If you were making an app that manipulates csv files using pandas, The result is edited as expected, I was warned that SettingWithCopyWarning.

SettingWithCopyWarning I think it's a warning because it's passed by reference (what is it?). Extracting a part of the original data → Substituting a part of it Did you want to modify "that part of the original data"? You don't know if you wanted to create "new data with some changes"?

Feel free to comment in the comments.

Solution

If you specify in copy () that it is a copy, not a reference, SettingWithCopyWarning has been resolved.

An app that puts your favorite fruits into a csv file

First, In a directory (folder) called output_files Create a csv file named data.csv.

The contents are apple,8 orange,15 banana,4 apple,1 It has become.

data.csv


apple,8
orange,15
banana,4
apple,1

This csv file Remove apple, 1 and Rewritten to apple, 9

data.csv


apple,9
orange,15
banana,4

The goal is to edit.

check_data.py


import pandas as pd

def check():
    df = pd.read_csv('output_files/data.csv', names=['fru_name', 'count'])
    #Most frequently in vc(Duplicate)Store fruit
    vc = df['fru_name'].value_counts().index[0]
    #Store the number of occurrences in fre
    fre = df['fru_name'].value_counts().iat[0]

    if fre > 1:
        #Remove the duplicate fruit line and change it to the variable new_Store in data
        new_data = df.drop_duplicates(subset='fru_name')

        #Dup the original number of overlapping fruits_Store in count
        dup_count = int(new_data.loc[new_data['fru_name'] == vc, 'count'])
        # dup_count+1
        dup_count += 1
        new_data.loc[new_data['fru_name'] == vc, 'count'] = dup_count

        #data.Overwrite csv
        new_data.to_csv('output_files/data.csv', index=False, header=False)

check()

python

SettingWithCopyWarning

The csv file is as intended

data.csv


apple,9
orange,15
banana,4

Was edited, but SettingWithCopyWarning is issued.

Improvement

In copy (), to make it clear that it is a copy, not a reference

python


new_data = df.drop_duplicates(subset='fru_name')

To

python


#copy()Explicitly be a copy, not a reference
new_data = df.drop_duplicates(subset='fru_name').copy()

change to.

check_data2.py


import pandas as pd

def check():
    df = pd.read_csv('output_files/data.csv', names=['fru_name', 'count'])
    #Most frequently in vc(Duplicate)Store fruit
    vc = df['fru_name'].value_counts().index[0]
    #Store the number of occurrences in fre
    fre = df['fru_name'].value_counts().iat[0]

    if fre > 1:
        #Remove the duplicate fruit line and change it to the variable new_Store in data
        new_data = df.drop_duplicates(subset='fru_name').copy()

        #Dup the original number of overlapping fruits_Store in count
        dup_count = int(new_data.loc[new_data['fru_name'] == vc, 'count'])
        # dup_count+1
        dup_count += 1
        new_data.loc[new_data['fru_name'] == vc, 'count'] = dup_count

        #data.Overwrite csv
        new_data.to_csv('output_files/data.csv', index=False, header=False)

check()

with this, Without SettingWithCopyWarning, The csv file is as intended It came to be edited.

Recommended Posts

pandas SettingWithCopyWarning
Pandas
Pandas memo
Pandas notes
Pandas memorandum
pandas memo
pandas self-study notes
My pandas (python)
Excel-> pandas-> sqlite
[pandas] GroupBy Tips
Read pandas data
About pandas describe
pandas related links
Missing value pandas
9rep --Pandas MySQL
pandas 1.2.0 What's new
Pandas operation memorandum
Sort by pandas
python pandas notes
pandas series part 1
[Note] pandas unstack