Pandas
import pandas as pd
df = pd.read_csv('data.csv')
pandas.DataFrame.describe — pandas 1.0.4 documentation
df.describe()
TeamId Score
count 4.709900e+04 47099.000000
mean 4.409698e+06 0.749839
std 9.901986e+05 0.099161
min 2.792400e+04 0.000000
25% 4.501446e+06 0.760760
50% 4.774358e+06 0.770330
75% 4.915774e+06 0.779900
max 5.051599e+06 1.000000
#Narrow down the output columns
df['Score'].describe()
count 47099.000000
mean 0.749839
std 0.099161
min 0.000000
25% 0.760760
50% 0.770330
75% 0.779900
max 1.000000
Name: Score, dtype: float64
Python Pandas: Boolean indexing on multiple columns - Stack Overflow
total_count = df['Score'].count() # 47099
partial_count = df[(0.6 < df['Score']) & (df['Score'] < 0.8)]['Score'].count() # 42893
pandas.Series.map — pandas 1.0.4 documentation
# Embarked(C, Q, S)Numerical value(1, 2, 3)Conversion to
df_train['Embarked'] = df_train['Embarked'].map({'C': 1, 'Q': 2, 'S': 3})
pandas.DataFrame.rename — pandas 1.0.4 documentation
# Sex(female, male)Numerical value(0, 1)Convert to and change column name to Male
df_train['Sex'] = df_train['Sex'].map({'female': 0, 'male': 1})
df_train = df_train.rename(columns={'Sex': 'Male'})
pandas.isnull — pandas 1.0.4 documentation pandas.DataFrame.sum — pandas 1.0.4 documentation
df_train.isnull().sum()
PassengerId 0
Survived 0
Pclass 0
Name 0
Male 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
#Exclude all rows containing missing values
df_train_dn = df_train.dropna()
#Exclude columns specified by columns
df_train_dn = df_train_dn.drop('Cabin', axis='columns
pandas.DataFrame.apply — pandas 1.0.4 documentation
#Extract titles
def getTitle(row):
name = row['Name']
p = re.compile('.*\ (.*)\.\ .*')
surname = p.search(name)
return surname.group(1)
df_train['Title'] = df_train.apply(getTitle, axis=1)
df_train['FamilyName'] = df_train.apply(getFamilyName, axis=1)
Indexing and selecting data — pandas 1.0.4 documentation Get / change the value of any position with pandas at, iat, loc, iloc | note.nkmk.me
#Specify column label
df_train.loc[:, ['Title', 'FamilyName']].head()
# Title FamilyName
# 0 Mr Braund
# 1 Mrs Cumings
# 2 Miss Heikkinen
# 3 Mrs Futrelle
# 4 Mr Allen
How to use Pandas groupby --Qiita
#Find the average age and number of data for each title
s_age_mean_groupby_title = df_train.groupby('Title').mean().loc[:, 'Age']
s_age_count_groupby_title = df_train.groupby('Title').count().loc[:, 'Age']
df_age = pd.concat([s_age_mean_groupby_title, s_age_count_groupby_title], axis='columns')
df_age.columns.values[0] = 'AgeMean'
df_age.columns.values[1] = 'AgeCount'
df_age.sort_values(by='AgeCount', ascending=False)
# AgeMean AgeCount
# Mr 32.368090 398
# Miss 21.773973 146
# Mrs 35.728972 107
# Master 4.574167 36
# Rev 43.166667 6
pandas.DataFrame.sort_values — pandas 1.0.5 documentation
--Normally, the DaraFrame that executed sort_values ()
is not changed, and the returned values are obtained in a sorted state.
If ʻinplace = Trueis specified, the DataFrame that executed
sort_values ()will be sorted and the return value will be
None`.
pandas.unique — pandas 1.0.5 documentation
pandas.io.formats.style.Styler.apply — pandas 1.0.5 documentation python - Pandas style function to highlight specific columns - Stack Overflow
Matplotlib
matplotlib.pyplot.axis — Matplotlib 3.2.1 documentation
plt.axis(xlim=(-0.005, 1.005), ylim=(0, 9000))
matplotlib.axes.Axes.set_ylim — Matplotlib 3.2.2 documentation
It is also possible to set for each axis with set_xlim ()
and set_ylim ()
.
#Set the upper limit of the y-axis to 100
plt.gca().set_ylim(top=100)
plt.gca().yaxis.set_label_position('right')
ylabel
in a matplotlib graph - Stack Overflow#Specify the label position to the right and set the coordinates(x, y) = (1.25, 0.5)Shift
#(Relative to the default coordinates at right(1.25, 0.5)Behaves off)
plt.gca().yaxis.set_label_position('right')
plt.gca().yaxis.set_label_coords(1.25, 0.5)
#Hide x-axis labels
plt.gca().set_xticklabels([])
#Hide y-axis label
plt.gca().set_yticklabels([])
matplotlib.pyplot.text — Matplotlib 3.1.2 documentation
#Y-axis label when there are multiple graphs(Response Time (s))Fill in
plt.gcf().text(
plt.gcf().axes[0].get_position().x1 - 0.45,
plt.gcf().axes[0].get_position().y1 - 0.5,
'Response Time (s)',
rotation=90
)
matplotlib.pyplot.tight_layout — Matplotlib 3.1.2 documentation [Python] Introducing how to eliminate overlapping characters output by Matplotlib! │ Python beginner's memorandum
plt.tight_layout()
matplotlib.pyplot.legend — Matplotlib 3.1.2 documentation
plt.legend(["legend1", "legend2"])
Specify the font with prop
.
How to easily display Japanese with Matplotlib (Windows) | Gammasoft Co., Ltd.
plt.legend(["Squared value"], prop={"family":"MS Gothic"})
Specify the position with bbox_to_anchor
.
python - How to put the legend out of the plot - Stack Overflow
plt.legend(["Squared value"], prop={"family":"MS Gothic"}, bbox_to_anchor=(1.05, 1))
#Calculate the slope when approximating a straight line
a = np.polyfit(x, y, 1)[0]
plt.ticklabel_format(style='plain')
Draw numbers on axis labels separated by three digits (matplotlib) --Qiita
plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda x, loc: '{:,}'.format(int(x))))
Legend guide — Matplotlib 3.2.2 documentation python - How is order of items in matplotlib legend determined? - Stack Overflow
handles = []
for label in labels:
handle = plt.scatter(..., label=label)
handles.append(handle)
#Define a sort criterion function in lambda
labels, handles = zip(*sorted(zip(labels, handles)), key=lamdba x: x[0])
matplotlib.pyplot.subplots_adjust — Matplotlib 3.2.2 documentation
plt.figure()
plt.subplot(121)
# ...
plt.subplot(122)
# ...
#Adjust width between subplot
plt.subplots_adjust(wspace=1, right=3)
ggplot
is a popular graphing tool in R.
The feature is that you can describe the graphs of multiple layers so that they overlap. What is R | ggplot2 | hanaori | note
plt.style.use('ggplot')
#Plot the gender of the survivors
df_train_survived = df_train_dn[df_train_dn.Survived == 1]
df_train_survived_age = df_train_survived.iloc[:, 3]
df_train_survived_male = df_train_survived.iloc[:, 2]
plt.scatter(
df_train_survived_age,
df_train_survived_male,
color="#cc6699",
alpha=0.5
)
#Plot the gender of the dead
df_train_dead = df_train_dn[df_train_dn.Survived == 0]
df_train_dead_age = df_train_dead.iloc[:, 3]
df_train_dead_male = df_train_dead.iloc[:, 2]
plt.scatter(
df_train_dead_age,
df_train_dead_male,
color="#6699cc",
alpha=0.5
)
plt.show()
9.4. decimal — Decimal fixed point and floating point arithmetic — Python 2.7.18 documentation
Specify the number of digits with the first argument of Decimal.quantize ()
.
decile = lambda num: Decimal(num).quantize(Decimal('.001'), rounding=ROUND_HALF_UP)
histogram = Counter(decile(score) for score in df['Score'])
print(histogram.keys())
# dict_keys([Decimal('0.761'), Decimal('0.000'), Decimal('0.775'), ...])
map ()
Getting index of item while processing a list using map in python - Stack Overflow
float
type#Specify the number of digits after the decimal point of the exponent to 3 digits
# e.g. float_number = 7.918330583e-06
'{:.3e}'.format(float_number)
# 7.918e-06
Recommended Posts