Generate Japanese test data with Python faker

What is faker


A library that generates dummy data (test data). PHP and Ruby also have the same name, and it has a de facto atmosphere.

This time, I will introduce it so that I can generate address data in Japanese.

What kind of data can be generated

What kind of data can faker generate? Let's write a simple example first.

from faker import Factory
f = Factory.create()
print f.address()
print f.phone_number()

Execution result

Jennie Homenick
Petramouth, WI 21918-9349

It will generate the data nicely, but the default is English-speaking notation. Data in other languages can also be generated by specifying location in the argument of Factory.create.

About Japanese support

I'm curious about Japanese support, but with the commit of @ ta2xeo about a month ago, names and phone numbers can now be generated in Japanese.

And this time, I made it possible for me to generate an address as well. Let's see it together.

from faker import Factory
f = Factory.create('ja_JP')
print f.phone_number()
print f.address()
print f.address()
print f.zipcode()
print f.prefecture()
print f.chome()
print f.ban()
print f.gou()
print f.building_name()

Execution result

Akiko Matsumoto
11-4-20 Hanakawado, Tsurumi-ku, Yokohama-shi, Fukushima Corp Minowa 553
31-24-20 Ujiie Shinden, Sammu City, Toyama Prefecture
Koganei City
11th Street
No. 8
No. 13

As you can see, there are almost no real addresses, good or bad. It may not be possible to generate consistent data, or it may not support various address display formats in Japan, but for the time being, it is better than English notation.

In using

~~ It seems that the Japanese version has not been released to PyPI yet. ~~ ~~ If you want to use it, please install it from the GitHub repository. ~~

Since it was released in v0.5.1, the steps in this section are unnecessary.

Creating data mask tool

You can generate test data with a library such as faker, but there are cases where dummy data alone does not work. In such cases, I usually want to mask some of the data in the production environment and use it, so I created a tool for that. Of course I use faker.

A tool called Hermes that masks only specific columns in CSV. It is still poor, but I plan to make steady improvements.

Recommended Posts

Generate Japanese test data with Python faker
Download Japanese stock price data with python
Primality test with Python
Data analysis with python 2
Primality test with python
Data analysis with Python
Create test data like that with Python (Part 1)
Sample data created with python
Send Japanese email with Python3
Get Youtube data with python
Japanese morphological analysis with Python
Read json data with python
I tried to make various "dummy data" with Python faker
[Python] Get economic data with DataReader
Python data structures learned with chemoinformatics
Unit test log output with python
[Python] Generate a password with Slackbot
Speak Japanese text with OpenJTalk + python
Easy data visualization with Python seaborn.
Generate fake table data with GAN
Process Pubmed .xml data with python
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
Python application: Data cleansing # 2: Data cleansing with DataFrame
Data pipeline construction with Python and Luigi
Receive textual data from mysql with python
[Python] Super easy test with assert statement
Stress Test with Locust written in Python
[Note] Get data from PostgreSQL with Python
WebUI test with Python2.6 + Selenium 2.44.0 --profile setting
Process Pubmed .xml data with python [Part 2]
Add a Python data source with Redash
Retrieving food data with Amazon API (Python)
Try working with binary data in Python
Convert Excel data to JSON with python
[Python] Use string data with scikit-learn SVM
Post Test 3 (Working with PosgreSQL in Python)
Notes on doing Japanese OCR with Python
Manipulate DynamoDB data with Lambda (Node & Python)
Convert FX 1-minute data to 5-minute data with Python
How to do portmanteau test with python
How to display python Japanese with lolipop
Integrating with setuptools / python test / pytest-runner
[Python] Let's make matplotlib compatible with Japanese
Recommendation of Altair! Data visualization with Python
Data analysis starting with python (data preprocessing-machine learning)
Let's do MySQL data manipulation with Python
How to enter Japanese with Python curses
Organize data divided by folder with Python
Process big data with Dataflow (ApacheBeam) + Python3
python + faker Randomly generate a point with a radius of 100m from a certain point
Code reading of faker, a library that generates test data in Python
FizzBuzz with Python3
Scraping with Python
Library comparison summary to generate PDF with Python
Statistics with python
Read data with python / netCDF> nc.variables [] / Check data size
Generate two correlated pseudo-random numbers (with Python sample)
Try it with Word Cloud Japanese Python JupyterLab.
Scraping with Python
Python with Go