Easily exchange data between Python, R and Julia using the Feather format

Since Python, R, and Julia have their own strengths, I think there are often situations where you want to use them in combination. It has the ability to call code directly, but in data analysis situations it is often sufficient to have separate scripts take charge of different steps without having to combine them so tightly. For example, it is easy to imagine a case where data is scraped in Python, analysis is performed in multithreading in Julia, and statistical analysis and visualization are performed in R.

Why Feather? In such a case, if you save it with pickle in Python, of course you can not bring the data to other programming languages, on the other hand, saving in CSV is slow or it is troublesome to reparse at the time of reading, etc. there is. This time, I will briefly introduce the Feather format that solves the problems of workflow construction and how to use it. Feather is a lightweight format for storing data, has a simple API, is free to move between programming languages, and is fast to read and write.

According to Comparison article here, Feather shows excellent performance in terms of both speed and memory consumption. The actual performance will vary depending on what kind of data you store, but it's easy to use anyway, so it's probably well worth a try.

Caution

** Feather format does not support row labels. Therefore, if you are giving row labels in pandas, you need to do df.reset_index () in advance. ** I don't think R uses row labels at all, and some say it's not recommended -of-a-dataframe-in-R /).

Feather format read / write code

Python

python.py


import pandas as pd
import feather

# read
df = feather.read_dataframe("foobar.feather")

# write
feather.write_dataframe(df, "foobar.feather")

R

r.r


library(feather)

# read
df <- read_feather("foobar.feather")

# write
write_feather(df, "foobar.feather")

Julia

julia.jl


using DataFrames
using Feather

# read
df = Feather.read("foobar.feather")

# write
Feather.write("foobar.feather", df)

Only this. I think it's easier than CSV because you can read and write in any language without worrying about types and headers.

Postscript: Feather V2 has recently been released. I haven't got the corresponding packages in Julia yet, so I won't cover them here. The content of this article is for Feather V1.

Recommended Posts

Easily exchange data between Python, R and Julia using the Feather format
Exchange encrypted data between Python and C #
Hashing data in R and Python
Easily graph data in shell and Python
Summary of the differences between PHP and Python
The answer of "1/2" is different between python2 and 3
Execute raw SQL using python data source with redash and display the result
[Python] Conversion memo between time data and numerical data
About the difference between "==" and "is" in python
Solving the Lorenz 96 model with Julia and Python
Send and receive Gmail via the Gmail API using Python
Write data to KINTONE using the Python requests module
I examined the data mapping between ArangoDB and Java
Investigate Java and python data exchange with Apache Arrow
Understand the status of data loss --Python vs. R
Data analysis using Python 0
Data cleaning using Python
I want to absorb the difference between the for statement on the Python + numpy matrix and the Julia for statement
[Python3] Format the character string using the variable name as the key.
Try hitting the Twitter API quickly and easily with Python
Visualize plant activity from space using satellite data and Python
Get the MIME type in Python and determine the file format
Various ways to calculate the similarity between data in python
Solve the spiral book (algorithm and data structure) with python!
Build a Python environment and transfer data to the server
Graph time series data in Python using pandas and matplotlib
[Python] I tried collecting data using the API of wikipedia
I tried to enumerate the differences between java and python
[Python] Random data extraction / combination from DataFrame using random and pandas
VBA users tried using Python / R: logical operations and comparison operations
Easily format JSON in Python
Python indentation and string format
[Python 2/3] Parse the format string
Data analysis using python pandas
Works with Python and R
Recommended books and sources of data analysis programming (Python or R)
Collect product information and process data using Rakuten product search API [Python]
Get and set the value of the dropdown menu using Python and Selenium