I tried to make various "dummy data" with Python faker

Introduction

If you are doing machine learning etc., you will need to prepare learning data. It would be nice if we could prepare the actual data, but there are many cases where it is difficult to obtain or the amount of data is insufficient. In such a case, I think that the flow will be to create dummy data and increase the amount of data.

This time, I will create various dummy data using Python's ** faker ** library.

Preparation

The environment uses Google Colaboratory. The Python version is below.

import platform
print("python " + platform.python_version())
# python 3.6.9

In addition, it is necessary to install the library faker that creates dummy data in advance.

pip install faker

Let's create dummy data

Now let's write the code.

First, import the library faker that creates dummy data. Also, make sure to target Japanese data.

from faker import Faker
fake = Faker('ja_JP')

Street address

First, let's create dummy data for the address. I tried to display 5 data.

[fake.address() for _ in range(5)]
# ['824 Palace Shibadaimon, 36-21-10 Koiri, Takatsu-ku, Kawasaki City, Miyagi Prefecture',
#  '17-11-18 Nishikanda, Nakano-ku, Kagawa Prefecture Otagaya Crest 528',
#  '6-10-4 Tokorono, Katsushika-ku, Hiroshima',
#  '24-17-14 Chouka, Saiwai-ku, Kawasaki-shi, Kumamoto Heights Yugu 667',
#  '34-12-7 Kuramae, Inba-mura, Inba-gun, Oita Prefecture Corp. Momura 228']

You can create address data with fake.address ().

You can also create data for other addresses.

#Prefectures
[fake.prefecture() for _ in range(5)]
# ['Okinawa Prefecture', 'Kyoto', 'Tochigi Prefecture', 'Saga Prefecture', 'Hiroshima Prefecture']

#Municipality
[fake.city() for _ in range(5)]
# ['Naka-ku, Yokohama', 'Hamura City', 'Toshima Village', 'Mitaka City', 'Miyakejima Miyake Village']

#Area name
[fake.town() for _ in range(5)]
# ['Satte', 'Tsurugaoka', 'Nishikawa', 'Iriya', 'Haneoricho']

#Building name
[fake.building_name() for _ in range(5)]
# ['Sharm', 'coat', 'Sharm', 'Park', 'Urban']

Full name

Next, let's create dummy data for the name. Name data can be created in kanji, katakana, and romaji.

Name (Kanji)

First, let's create kanji name data.

#Name (Kanji)
[fake.name() for _ in range(5)]
# ['Chiyo Nakatsugawa', 'Yuta Wakamatsu', 'Kaori Kudo', 'Kana Uno', 'Yoko Hirokawa']

#Name (Kanji, male)
[fake.name_male() for _ in range(5)]
# ['Ryohei Sasaki', 'Atsushi Sato', 'Shota Sasaki', 'Kenichi Kato', 'Ryohei Aoyama']

#Name (Kanji, female)
[fake.name_female() for _ in range(5)]
# ['Akemi Inoue', 'Kaori Matsumoto', 'Tomomi Wakamatsu', 'Haruka Takahashi', 'Hanako Sugiyama']

#Surname (Kanji)
[fake.last_name() for _ in range(5)]
# ['Matsumoto', 'Kondo', 'Fujimoto', 'Murayama', 'Kato']

#First name (kanji)
[fake.first_name() for _ in range(5)]
# ['Minoru', 'zero', 'Hanako', 'Ryosuke', 'Kaori']

#First name (Kanji, male)
[fake.first_name_male() for _ in range(5)]
# ['Hiroki', 'Naoto', 'Atsushi', 'Naoki', 'Akira']

#First name (kanji, female)
[fake.first_name_female() for _ in range(5)]
# ['dance', 'Mikako', 'Tomomi', 'Akemi', 'Akemi']

Name (Katakana)

Next, let's create katakana name data.

#Name (Katakana)
[fake.kana_name() for _ in range(5)]
# ['Yui Ogaki', 'Harada Takuma', 'Nakamura Tsubasa', 'Yamada Sayuri', 'Tsuchiya Sotaro']

#Surname (Katakana)
[fake.last_kana_name() for _ in range(5)]
# ['Miyake', 'Kanou', 'Kudo', 'Harada', 'Aota']

#First name (katakana)
[fake.first_kana_name() for _ in range(5)]
# ['Maaya', 'Naoko', 'Miki', 'Kenichi', 'Yasuhiro']

#First name (katakana, male)
[fake.first_kana_name_male() for _ in range(5)]
# ['Manab', 'Manab', 'Yasuhiro', 'Kenichi', 'Atsushi']

#First name (katakana, female)
[fake.first_kana_name_female() for _ in range(5)]
# ['Tomomi', 'Sayuri', 'Aska', 'Tsubasa', 'Yui']

Name (in romaji)

As the end of the name data, let's create the one written in Roman letters.

#Name (in romaji)
[fake.romanized_name() for _ in range(5)]
# ['Akira Nakamura','Ryosuke Yamada','Yui Takahashi','Maaya Ogaki','Mituru Fujimoto']

#Surname (Roman alphabet)
[fake.last_romanized_name() for _ in range(5)]
# ['Tsuda', 'Tsuchiya', 'Yamada', 'Nakatsugawa', 'Nakamura']

#First name (romaji)
[fake.first_romanized_name() for _ in range(5)]
# ['Mai', 'Manabu', 'Nanaka', 'Kenichi', 'Taro']

#First name (Romaji, male)
[fake.first_romanized_name_male() for _ in range(5)]
# ['Tomoya', 'Hiroshi', 'Taichi', 'Mituru', 'Manabu']

#First name (romaji, female)
[fake.first_romanized_name_female() for _ in range(5)]
# ['Haruka', 'Maaya', 'Kaori', 'Kumiko', 'Yoko']

Other

We have created address and name data, but you can also create other data. Here are some of them.

#company name
[fake.company() for _ in range(5)]
# ['Harada Gas Co., Ltd.', 'Sasada Mining Co., Ltd.', 'Miyake Gas Co., Ltd.', 'Kudo Construction Co., Ltd.', 'Kobayashi Fisheries Co., Ltd.']

#industry
[fake.company_category() for _ in range(5)]
# ['gas', 'printing', 'Bank', 'Food', 'insurance']

#Profession
[fake.job() for _ in range(5)]
# ['Bus guide', 'Esthetician', 'Wedding planner', 'fortune teller', 'pharmacist']

#word
[fake.word() for _ in range(5)]
# ['weave', 'College', 'Performer', 'today', 'To modernize']

Summary

This time, I used Python faker to create various dummy data.

When preparing data for machine learning etc., I think that it is often the case that actual data alone is not enough. In such a case, I think that dummy data will be useful.

In addition to the ones introduced here, faker allows you to create various dummy data. For details, please refer to the following page. https://faker.readthedocs.io/en/master/locales/ja_JP.html

Recommended Posts

I tried to make various "dummy data" with Python faker
I tried to get CloudWatch data with Python
I tried to analyze J League data with Python
I tried various methods to send Japanese mail with Python
I tried to make GUI tic-tac-toe with Python and Tkinter
[5th] I tried to make a certain authenticator-like tool with python
[2nd] I tried to make a certain authenticator-like tool with python
[3rd] I tried to make a certain authenticator-like tool with python
[Pandas] I tried to analyze sales data with Python [For beginners]
I tried to make a periodical process with Selenium and Python
I tried to make a 2channel post notification application with Python
I tried to make a todo application using bottle with python
[4th] I tried to make a certain authenticator-like tool with python
[1st] I tried to make a certain authenticator-like tool with python
[Python] I tried to get various information using YouTube Data API!
I tried to make an image similarity function with Python + OpenCV
I tried to save the data with discord
I want to make a game with Python
I tried to output LLVM IR with Python
I tried to automate sushi making with python
Python: I tried to make a flat / flat_map just right with a generator
[Data science basics] I tried saving from csv to mysql with python
[AWS] [GCP] I tried to make cloud services easy to use with Python
I tried to make a traffic light-like with Raspberry Pi 4 (Python edition)
I tried fMRI data analysis with python (Introduction to brain information decoding)
[Zaif] I tried to make it easy to trade virtual currencies with Python
I tried fp-growth with python
I tried scraping with Python
I tried gRPC with Python
I tried scraping with python
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python
Continuation ・ I tried to make Slackbot after studying Python3
I tried to get started with blender python script_Part 02
I tried to implement an artificial perceptron with python
I tried to automatically generate a password with Python3
I tried to solve the problem with Python Vol.1
I tried to make an OCR application with PySimpleGUI
I tried to solve AOJ's number theory with Python
I tried to make a periodical process with CentOS7, Selenium, Python and Chrome
I tried to make a simple mail sending application with tkinter of Python
[Patent analysis] I tried to make a patent map with Python without spending money
I tried to find the entropy of the image with python
I tried to simulate how the infection spreads with Python
I tried to make a real-time sound source separation mock with Python machine learning
I tried to touch Python (installation)
I want to be able to analyze data with Python (Part 1)
I tried web scraping with python.
I want to be able to analyze data with Python (Part 4)
I want to be able to analyze data with Python (Part 2)
[Python] I tried to visualize tweets about Corona with WordCloud
Mayungo's Python Learning Episode 3: I tried to print numbers with print
I tried to make a stopwatch using tkinter in python
I tried to aggregate & compare unit price data by language with Real Gachi by Python
I tried to divide the file into folders with Python
I want to debug with Python
I tried running prolog with python 3.8.2.
I tried SMTP communication with Python