R code compatible sheet for Python users

For those who understand Python analysis code, we have summarized the correspondence of R code. * Updating from time to time (In this article, only the R base package is used)

There are many people who ask, "How do you write in R when you write in python?"

Naming conventions in the document

Unless otherwise noted, module name aliases are as follows.

python


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Image of variable names appearing below.

python


df = pd.DataFrame()

R


df = data.frame()

How to write Python code in R

Data frame generation

pd.DataFrame() Creating a data frame

R


data.frame() #Generate an empty data frame
data.frame(col1=c(x1, x2, x3), col2=c(y1, y2, y3)) #column

pd.read_csv() Read CSV file (comma separated data)

R


read.csv(file name)

pd.read_table() Read TSV and CSV files (tab-delimited data)

R


read.table(file name)

df.index = [line name 1, line name 2, ...]

Line name settings

R


rownames(df) <- c(Line name 1,Line name 2, ...)
print(rownames(df)) #Can be obtained as a vector by calling without substituting

df.columns = [column name 1, column name 2, ...]

Column name settings

R


colnames(df) <- c(Column name 1,Column name 2, ...)
print(colnames(df)) #Can be obtained as a vector by calling without substituting

Check the contents of the data frame

df.shape Get the number of rows and columns

R


dim(df)

len(df) Get the number of lines

R


ncol(df)

len(df.columns) Get the number of columns

R


nrow(df)

df.head() First line output

R


head(df) #You can also specify the number of lines to display with an argument

df.tail() Last line output

R


tail(df) #You can also specify the number of lines to display with an argument

df.info() Display the number and type information of each column

R


str(df)

df.describe() Output basic statistics

R


summary(df) #However, std is not output

#get std, for example:
sds = NULL
for(col in colnames(df)){
  sds <- c(sds, sd(df[, col]))
}
names(sds) <- colnames(df)

df.isna() Check for missing values (NA)

R


is.na(df)

df.isna().sum()

Check the number of missing values (NA) for each column

R


colSums(is.na(df))
# summary(df)But the number of NA is also output so you can check it

df[df.isna().any(axis=1)] Extract rows that have at least one missing value (NA)

R


df[!complete.cases(df), ]

df.col.unique() Returns a unique (non-overlapping) value that appears in a column

R


unique(df$col)

df.col.value_counts() Returns the number of appearances of a value that appears in a column

R


table(df$col)

Data extraction

df.iloc[x1:x2, y1:y2] Specify the range using the row number and column number

R


df[x1:x2, y1:y2] #Note that R has an index start of 1

df.iloc[[x1, x2, ...], [y1, y2, ...]] Specify a list using row and column numbers

R


df[c(x1, x2, ...), c(y1, y2, ...)]

df.loc [row name 1: row name 2, column name 1: column name 2]

Specify the range using the row name and column name

R


#It doesn't seem to exist clearly, so if you do it,
#Obtain the position (number) of the specified row name and column name and use it for range specification.
x1 <- which(rownames(df) ==Line name 1)
x2 <- which(rownames(df) ==Line name 2)
y1 <- which(colnames(df) ==Column name 1)
y2 <- which(colnames(df) ==Column name 2)
df[x1:x2, y1:y2]

df.loc [[row name 1, row name 2, ...], [column name 1, column name 2, ...]]

Specify a list using row and column names

R


df[c(Line name 1,Line name 2, ...), c(Column name 1,Column name 2, ...)]

df[df.col == x] Extract rows that match the conditions

R


df[df$col == x, ]
#Or
subset(df, col == x)

Data processing

df[new_col] = x Add a new column to the data frame

R


df[, new_col] <- x

df.drop() Delete rows and columns

R


#You can delete by selecting the row or column you want to delete and assigning NULL.
df[c(x1, x2), ] <- NULL #Delete line
df[, c(y1, y2)] <- NULL #Delete column

#Using the property of returning a matrix excluding that number when the index is negative, you can also write:
df <- df[c(-1, -2), ] #Delete line
df <- df[, c(-1, -2)] #Delete column

df.fillna(x) Fill in missing values (NA)

R


df[is.na(df)] <- x

df.dropna() Delete rows that contain missing values (NA)

R


na.omit(df)

df.apply(func) Apply the function func to each element one by one

R


sapply(df, FUN =func)

df.col.apply(func) Apply the func function to each element of the specified column

R


sapply(df$x, FUN =func)

df.T Transpose the matrix

R


t(df)

pd.to_datetime() Convert to date type

R


as.Date(df$col) #Date only (eg:'2020-01-01')

Data aggregation

df.max()、 df.min() Find the maximum and minimum values for each column

R


sapply(df, FUN =max)
sapply(df, FUN =min)

#Equivalent processing is possible with apply
apply(df, MARGIN=2, FUN =max) #MARGIN=If 1, the function (FUN) is applied line by line.
apply(df, MARGIN=2, FUN =min) #max(df)If, find the maximum value among all elements (same for min)

df.groupby([x1, x2, ...]).agg(func) Group and aggregate

R


aggregate(. ~ x1+x2, df, FUN=sum) #「."Aggregates all columns
aggregate(x ~ x1+x2, df, FUN=sum) #Performs aggregation processing for the column specified by "x"

pd.pivot_table(df, index, columns, values) Not in the base package. maybe.

Recommended Posts

R code compatible sheet for Python users
Python code memo for yourself
[Python] Sample code for Python grammar
Python cheat sheet (for C ++ experienced)
[python] Get Twitter timeline for multiple users
python> coding guide> PEP 0008 --Style Guide for Python Code
AtCoder cheat sheet in python (for myself)
2016-10-30 else for Python3> for:
python [for myself]
python character code
[Updating] Python Syntax cheat sheet for Java shop
Code for checking the operation of Python Matplotlib
Settings for Python coding in Visual Studio Code
Building a Docker working environment for R and Python
Specific sample code for working with SQLite3 in Python
VS Code settings for developing in Python with completion
Tips for speeding up python code correctly with numba
Make Visual Studio Code autocomplete for python external libraries
Expose settings.json for efficient Python coding in VS Code
[Python] Code for measuring ambient light RGB of APDS9960
Python code for k-means method in super simple case
About Python for loops
Python code acceleration approach
Rewrite Python2 code to Python3 (2to3)
infomap python draw code
Before writing Python code
About Python, for ~ (range)
Python3 cheat sheet (basic)
python textbook for beginners
About Python3 character code
PySpark Cheat Sheet [Python]
Refactoring tools for Python
Python sort cheat sheet
python for android Toolchain
OpenCV for Python beginners
Install Python (for Windows)
[Python] for statement error
Python environment for projects
Insert Import statements needed for Python code completion in Neovim
Enable the virtualenv Python virtual environment for Visual Studio Code
Which should I study, R or Python, for data analysis?
I just wrote the original material for the python sample code
Python code for writing CSV data to DSX object storage
VBA users tried using Python / R: logical operations and comparison operations
[Python] Create a screen for HTTP status code 403/404/500 with Django