Create test data like that with Python (Part 1)

I was told about the PHP version of Faker that creates dummy data on Twitter, so when I searched for the Pyhton version, I had a plan, so I installed it. ..

Installation

% pip install faker
% faker --version
faker 4.0.2

Trial data creation

make_fake_data.py


from faker.factory import Factory

Faker = Factory.create
fake = Faker()
fake.seed(0)
fake = Faker("ja_JP")

print(
    fake.csv(
        header=None,
        data_columns=("{{name}}", "{{zipcode}}", "{{address}}", "{{phone_number}}"),
        num_rows=10,
        include_row_ids=False,
    )
)

Try running it in VS Code debug mode.

 %  env PTVSD_LAUNCHER_PORT=53546 /usr/local/opt/python/bin/python3.7 /Users/nandymak/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/launcher /Users/nandymak/dev/fake-data/make_fake_data.py 
"Naoko Fujimoto","265-8376","23-19-7 Misuji, Nishi-ku, Yokohama-shi, Oita Gomigaya Corp. 948","92-4115-7815"
"Ryosuke Nagisa","989-9052","36-3-1 Tomihisa-cho, Shirako-cho, Chosei-gun, Saga Platinum Urban 097","53-5139-3328"
"Sotaro Ito","520-8016","35-7-20 Kudanminami, Mizuho-cho, Nishitama-gun, Kyoto Prefecture","090-4719-6593"
"Kenichi Kato","627-4260","3-27-3 Kaminarimon, Niijima Village, Akita Prefecture Senzoku Court 684","090-3396-9477"
"Yoko Watanabe","812-5855","13-8-1, Marunouchi JP Tower, Edogawa-ku, Nara Prefecture","090-1352-5601"
"Kumiko Yamagishi","836-9402","8-21-7 Nagahata, Kokubunji City, Miyazaki Prefecture Hitotsubashi Park 510","090-3217-3008"
"Shota Inoue","226-1179","3-20-4 Gomigaya, Sakae-cho, Inba-gun, Ishikawa Prefecture Shibaura Urban 792","090-3022-5841"
"Kana Sasada","482-6715","25-27-9, Rokubancho, Seya-ku, Yokohama-shi, Nagasaki Heights Konan 150","090-2375-9459"
"Mai Nakatsugawa","732-5083","13-23-11 Maeyaroku Corp. 960, Higashikurume City, Nagano Prefecture","080-9602-7142"
"Ryosuke Yamada","618-0001","27-7-18 Hirasuka, Chiyoda-ku, Mie Court Marunouchi JP Tower 206","65-0300-8913"

That kind of data was created. It's so much like that, when using it in a company, if you do not say in advance that it is dummy data generated by Faker, personal information may be leaked and it may cause a fuss.

Besides CSV

TSV or DSV? And so on. I have to sort out the functions that can be used (TODO).

faker.py


fake.tsv(header=None, data_columns=('{{name}}', '{{address}}'), num_rows=10, include_row_ids=False)

What I noticed

Items that can be specified

I tried to summarize it in a table, but I gave up because it seems that there are more than 200. How to generate test data using Faker in Python

For the time being, here are some things that you might use often. You can probably find the full list by looking at the official website Docs »Locales» Language ja_JP.

Address system (faker.providers.address)

Method name meaning sample
address Street address 161 Chizuka Palace, 38-9-5 Hirasuka, Hachijo-cho, Hachijojima, Kumamoto
ban address No. 6
building_name Building name Park
building_number Building number? 263
chome Chome 1-chome
city Municipalities Komae City
city_suffix Municipalities(Fixed value?) Ville
country Country New Caledonia
gou No. No. 15
postcode Postal code 288-2290
prefecture Name of prefectures Tochigi Prefecture
street_address address 215 Kimura Street
street_name Street name Sasaki Street
street_suffix Street suffix(※1) Street
town Town name Odaiba
zipcode Postal code 149-3866

Personal name system (faker.providers.person)

Method name meaning sample
name Full name(Chinese characters) Yui Aoyama
last_name Last name(Chinese characters) Takahashi
first_name name(Chinese characters) Yumiko
name_female Female name(Chinese characters) Tomomi Tanabe
name_male Male name(Chinese characters) Yoichi Fujimoto
last_name_male Male surname(Chinese characters)? Nishinoen
first_name_male Male name(Chinese characters) Atsushi
last_name_female Female surname(Chinese characters)? Yoshida
first_name_female Female name(Chinese characters) Tomomi
romanized_name Full name(Romaji) Akira Sasada
last_romanized_name Last name(Romaji) Ogaki
first_romanized_name name(Romaji) Naoki
first_romanized_name_male Male name(Romaji) Manabu
first_romanized_name_female Female name(Romaji) Rei
kana_name Full name(Kana) Takahashi Miki
last_kana_name Last name(Kana) Saito
first_kana_name name(Kana) Yoichi
first_kana_name_male Male name(Kana) Naoto
first_kana_name_female Female name(Kana) My

Check item length

I checked the item length of the data generated to plunge into the RDB. I confirmed it at Colaboratory.

# !pip install faker                         #Run only for the first time
import numpy as np
import pandas as pd
from faker.factory import Factory
Faker = Factory.create
fake = Faker()
fake = Faker("ja_JP")
test_data = []
x = 1000000                              #Number of measurements

%timeit 
for i in range(0, x):
    test_data.append(len(fake.address()))  #Specify the item you want to measure(fake.xxxxx())

a=np.mean(test_data)
b=np.max(test_data)
print('mean={}、max={}'.format(a,b))

It is about 50 characters at the maximum in 1 million cases.

mean=26.205511、max=53

I tried it several times, but 55 was the maximum, so it seems good to think about 64 characters. ** Please note that it is not the number of bytes. ** **

Generate SQL for Insert (TODO)

As for my homework, I would like to create a "CREATE TABLE" statement and a WRAPPER that generates an "INSERT" statement so that I can create a table in the RDB after specifying the required items. ~~ * You need to find out the number of digits and attributes of the item name that Faker spits out for each method. ~~


For the time being, I should have forgiven around here for today.

Recommended Posts

Create test data like that with Python (Part 1)
Process Pubmed .xml data with python [Part 2]
Generate Japanese test data with Python faker
Primality test with Python
Data analysis with python 2
Create an app that guesses students with python
Create a page that loads infinitely with python
Primality test with python
Data analysis with Python
Create fractal shapes with python part1 (Sierpinski Gasket)
[Python] Create structured array (store heterogeneous data with NumPy)
Note that writing like this with ruby is writing like this with python
A server that echoes data POSTed with flask / python
Image processing with Python (Part 2)
Sample data created with python
Studying Python with freeCodeCamp part1
Bordering images with python Part 1
Scraping with Selenium + Python Part 1
Create 3d gif with python3
Get Youtube data with python
Studying Python with freeCodeCamp part2
Image processing with Python (Part 1)
Solving Sudoku with Python (Part 2)
Image processing with Python (Part 3)
Scraping with Selenium + Python Part 2
Create a directory with python
Read json data with python
A memo that reads data from dashDB with Python & Spark
I want to be able to analyze data with Python (Part 3)
I want to be able to analyze data with Python (Part 1)
Let's create a script that registers with Ideone.com in Python.
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
Treat the Interface class like that with Python type annotations
Test Driven Development with Django Part 3
Playing handwritten numbers with python Part 1
Create plot animation with Python + Matplotlib
Test Driven Development with Django Part 4
Create Awaitable with Python / C API
Python Application: Data Cleansing Part 1: Python Notation
Test Driven Development with Django Part 6
[Automation with python! ] Part 1: Setting file
Python Application: Data Handling Part 3: Data Format
Create folders from '01' to '12' with python
Test Driven Development with Django Part 2
Create a virtual environment with Python!
[Python] Get economic data with DataReader
Create an Excel file with Python3
Python data structures learned with chemoinformatics
Unit test log output with python
Create noise-filled audio data with SoX
Easy data visualization with Python seaborn.
Automate simple tasks with Python Part0
Python application: data visualization part 1: basic
[Automation with python! ] Part 2: File operation
Process Pubmed .xml data with python
Create github pages with lektor Part 1
Data analysis starting with python (data visualization 1)
Test Driven Development with Django Part 1
Data analysis starting with python (data visualization 2)
Python application: Data cleansing # 2: Data cleansing with DataFrame