R code compatible sheet for Python users

For those who understand Python analysis code, we have summarized the correspondence of R code. * Updating from time to time (In this article, only the R base package is used)

There are many people who ask, "How do you write in R when you write in python?"

Naming conventions in the document

Unless otherwise noted, module name aliases are as follows.


import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Image of variable names appearing below.


df = pd.DataFrame()


df = data.frame()

How to write Python code in R

Data frame generation

pd.DataFrame() Creating a data frame


data.frame() #Generate an empty data frame
data.frame(col1=c(x1, x2, x3), col2=c(y1, y2, y3)) #column

pd.read_csv() Read CSV file (comma separated data)


read.csv(file name)

pd.read_table() Read TSV and CSV files (tab-delimited data)


read.table(file name)

df.index = [line name 1, line name 2, ...]

Line name settings


rownames(df) <- c(Line name 1,Line name 2, ...)
print(rownames(df)) #Can be obtained as a vector by calling without substituting

df.columns = [column name 1, column name 2, ...]

Column name settings


colnames(df) <- c(Column name 1,Column name 2, ...)
print(colnames(df)) #Can be obtained as a vector by calling without substituting

Check the contents of the data frame

df.shape Get the number of rows and columns



len(df) Get the number of lines



len(df.columns) Get the number of columns



df.head() First line output


head(df) #You can also specify the number of lines to display with an argument

df.tail() Last line output


tail(df) #You can also specify the number of lines to display with an argument

df.info() Display the number and type information of each column



df.describe() Output basic statistics


summary(df) #However, std is not output

#get std, for example:
sds = NULL
for(col in colnames(df)){
  sds <- c(sds, sd(df[, col]))
names(sds) <- colnames(df)

df.isna() Check for missing values (NA)




Check the number of missing values (NA) for each column


# summary(df)But the number of NA is also output so you can check it

df[df.isna().any(axis=1)] Extract rows that have at least one missing value (NA)


df[!complete.cases(df), ]

df.col.unique() Returns a unique (non-overlapping) value that appears in a column



df.col.value_counts() Returns the number of appearances of a value that appears in a column



Data extraction

df.iloc[x1:x2, y1:y2] Specify the range using the row number and column number


df[x1:x2, y1:y2] #Note that R has an index start of 1

df.iloc[[x1, x2, ...], [y1, y2, ...]] Specify a list using row and column numbers


df[c(x1, x2, ...), c(y1, y2, ...)]

df.loc [row name 1: row name 2, column name 1: column name 2]

Specify the range using the row name and column name


#It doesn't seem to exist clearly, so if you do it,
#Obtain the position (number) of the specified row name and column name and use it for range specification.
x1 <- which(rownames(df) ==Line name 1)
x2 <- which(rownames(df) ==Line name 2)
y1 <- which(colnames(df) ==Column name 1)
y2 <- which(colnames(df) ==Column name 2)
df[x1:x2, y1:y2]

df.loc [[row name 1, row name 2, ...], [column name 1, column name 2, ...]]

Specify a list using row and column names


df[c(Line name 1,Line name 2, ...), c(Column name 1,Column name 2, ...)]

df[df.col == x] Extract rows that match the conditions


df[df$col == x, ]
subset(df, col == x)

Data processing

df[new_col] = x Add a new column to the data frame


df[, new_col] <- x

df.drop() Delete rows and columns


#You can delete by selecting the row or column you want to delete and assigning NULL.
df[c(x1, x2), ] <- NULL #Delete line
df[, c(y1, y2)] <- NULL #Delete column

#Using the property of returning a matrix excluding that number when the index is negative, you can also write:
df <- df[c(-1, -2), ] #Delete line
df <- df[, c(-1, -2)] #Delete column

df.fillna(x) Fill in missing values (NA)


df[is.na(df)] <- x

df.dropna() Delete rows that contain missing values (NA)



df.apply(func) Apply the function func to each element one by one


sapply(df, FUN =func)

df.col.apply(func) Apply the func function to each element of the specified column


sapply(df$x, FUN =func)

df.T Transpose the matrix



pd.to_datetime() Convert to date type


as.Date(df$col) #Date only (eg:'2020-01-01')

Data aggregation

df.max()、 df.min() Find the maximum and minimum values for each column


sapply(df, FUN =max)
sapply(df, FUN =min)

#Equivalent processing is possible with apply
apply(df, MARGIN=2, FUN =max) #MARGIN=If 1, the function (FUN) is applied line by line.
apply(df, MARGIN=2, FUN =min) #max(df)If, find the maximum value among all elements (same for min)

df.groupby([x1, x2, ...]).agg(func) Group and aggregate


aggregate(. ~ x1+x2, df, FUN=sum) #「."Aggregates all columns
aggregate(x ~ x1+x2, df, FUN=sum) #Performs aggregation processing for the column specified by "x"

pd.pivot_table(df, index, columns, values) Not in the base package. maybe.

