Code reading of faker, a library that generates test data in Python

Introduction

During this time, @TakesxiSximada posted an article Code reading of the library Safe that examines password strength in Python, so I tried it myself. I did.

So I picked up a library called faker that I had been interested in using for a while.

faker is a library that generates dummy test data nicely. This is the python version of the one you often see in other languages.

https://pypi.python.org/pypi/fake-factory/0.5.3 https://github.com/joke2k/faker

Install You can install it with pip.

$ pip install fake-factory

Try the README sample code

>>> from faker import Factory

And it is OK if you generate a generator that creates test data.

>>> fake = Factory.create()

After that, the test data will be returned like this.

>>> fake.name()
'Anfernee Reichel'
>>> fake.address()
'084 Tiney Fork Suite 757\nPort Earl, MI 20240-1776'
>>> fake.text()
'Facilis non eligendi qui deleniti ullam est. Ab minus est non et occaecati laborum sequi. Vero consectetur repellendus dicta velit. Quisquam omnis alias error sed totam.'

It also supports multilingualization, and can be realized by passing locale as an argument toFactory.create ().

>>> fake = Factory.create('ja_JP')
>>> fake.name()
'Yumiko Tsuda'
>>> fake.address()
'32-22-3 Shiba Park, Chuo-ku, Gunma Prefecture Kamihiroya Heights 400'
>>> fake.text()
'Non ut in unde ipsa fugiat excepturi voluptate. Enim molestias voluptatem aperiam. Est fuga distinctio sit officia qui velit numquam sint.'

Japanese data was not prepared for text, so the default ʻen_US` data is returned.

By the way,'fakeis an instance offaker.generator.Generator ()`.

>>> type(fake)
<class 'faker.generator.Generator'>

Code reading

Then I would like to read the code, Before that, it is quicker to understand who the Provider is in faker, so I will explain the Provider first.

Provider-provides test data

Each Provider is stored under faker / faker / providers.

├── providers
│   ├── __init__.py
│   ├── __pycache__
│   ├── address
│   ├── barcode
│   ├── color
│   ├── company
│   ├── credit_card
│   ├── currency
│   ├── date_time
│   ├── file
│   ├── internet
│   ├── job
│   ├── lorem
│   ├── misc
│   ├── person
│   ├── phone_number
│   ├── profile
│   ├── python
│   ├── ssn
│   └── user_agent

For each category such as ʻaddressandbarcode`, directories corresponding to each language and Provider which is the base of each category are implemented.

Here, we will focus on person and follow the source. The subordinate of person looks like this.

├── providers
│   ├── __init__.py
│   ├── person
│   │   ├── __init__.py
│   │   ├── bg_BG
│   │   ├── cs_CZ
│   │   ├── de_AT
│   │   ├── de_DE
│   │   ├── dk_DK
│   │   ├── el_GR
│   │   ├── en
│   │   ├── en_US
│   │   ├── es_ES
│   │   ├── es_MX
│   │   ├── fa_IR
│   │   ├── fi_FI
│   │   ├── fr_FR
│   │   ├── hi_IN
│   │   ├── hr_HR
│   │   ├── it_IT
│   │   ├── ja_JP
│   │   ├── ko_KR
│   │   ├── lt_LT
│   │   ├── lv_LV
│   │   ├── ne_NP
│   │   ├── nl_NL
│   │   ├── no_NO
│   │   ├── pl_PL
│   │   ├── pt_BR
│   │   ├── pt_PT
│   │   ├── ru_RU
│   │   ├── sl_SI
│   │   ├── sv_SE
│   │   ├── tr_TR
│   │   ├── uk_UA
│   │   ├── zh_CN
│   │   └── zh_TW

Next, look at __init __. Py directly under / faker / providers / person. to watch.

from .. import BaseProvider


class Provider(BaseProvider):
    formats = ['{{first_name}} {{last_name}}', ]

    first_names = ['John', 'Jane']

    last_names = ['Doe', ]

    def name(self):
        """
        :example 'John Doe'
        """
        pattern = self.random_element(self.formats)
        return self.generator.parse(pattern)

    @classmethod
    def first_name(cls):
        return cls.random_element(cls.first_names)

    @classmethod
    def last_name(cls):
        return cls.random_element(cls.last_names)
        
    #Omitted below

Like this, the Provider that is the base of the Person Provider of each language is implemented. It inherits BaseProvider that implements classmethods that extract data randomly such as random_element (). You can see that.

Then, inherit this Provider and create new properties and methods, or override them to prepare Providers corresponding to each language. Please refer to the following for the Person Provider for Japanese. https://github.com/joke2k/faker/blob/master/faker/providers/person/ja_JP/init.py

Factory.create () --Create Generator

https://github.com/joke2k/faker/blob/master/faker/factory.py#L14-L44

This method creates an instance of <class'faker.generator.Generator'> and returns it.

In the following process, each Provider is set in faker which is an instance of<class'faker.generator.Generator'>based on locale passed as an argument of Factory.create (). I will. (If there is no Provider corresponding to the specified locale, the one with DEFAULT_LOCALE ʻen_US` is set.)

for prov_name in as:
    if prov_name == 'faker.as':
        continue

    prov_cls, lang_found = cls._get_provider_class(prov_name, locale)
    provider = prov_cls(faker)
    provider.__provider__ = prov_name
    provider.__lang__ = lang_found
    faker.add_provider(provider)

Next, let's take a look at the ʻadd_provider (provider)` that came out in the above process.

Generator.add_provider ()-Add format to Generator

https://github.com/joke2k/faker/blob/master/faker/generator.py#L22-L39

The public method defined by Provider (ex. <Faker.providers.person.ja_JP.Provider>) passed as an argument is added to the generator format.

Generator.set_formatter () --wrapper function for setattr ()

https://github.com/joke2k/faker/blob/master/faker/generator.py#L70-L75 The word format suddenly appeared at Generator.add_provider (), but I'm just doing setattr () on the Generator instance.

Summary

By doing Factory.create () as we have seen so far You can get an instance of <class'faker.generator.Generator'> with all the public methods defined in the Provider group of each language set in attributes. Thanks to this, just by calling fake.method_name () as shown below, method_name () implemented in the Provider of each language is executed and random test data can be obtained. I'm sorry.

>>> fake.name()
'Anfernee Reichel'

at the end

I'm exhausted and I'm only following the Factory.create () part, but if you understand how to generate a Generator, you'll know how to use this library in other ways. Code reading with such a thin library is recommended because it was easy to attach and fun!

Afterword

In the middle of writing this article "[PersonProvider] of ja_JP (https://github.com/joke2k/faker/blob/master/faker/providers/person/ja_JP/init.py) keeps the format of name () in Japanese Because I was doing, ʻuser_name ()anddomain_word ()` are not displayed properly. " I ran into the problem. https://github.com/joke2k/faker/blob/master/faker/providers/internet/init.py#L27-L32 https://github.com/joke2k/faker/blob/master/faker/providers/internet/init.py#L90-L95

He issued a PR to deal with the above problem and merged it safely. https://github.com/joke2k/faker/pull/300

Recommended Posts

Code reading of faker, a library that generates test data in Python
Code reading of Safe, a library for checking password strength in Python
A well-prepared record of data analysis in Python
Code reading for m3u8, a library for manipulating HLS video format m3u8 files in Python
Create code that outputs "A and pretending B" in python
A set of script files that do wordcloud in Python3
[Python] Creating a GUI tool that automatically processes CSV of temperature rise data in Excel
A Python script that stores 15 years of MLB game data in MySQL in 10 minutes (Baseball Hack!)
A simple data analysis of Bitcoin provided by CoinMetrics in Python
Use networkx, a library that handles graphs in python (Part 2: Tutorial)
About psd-tools, a library that can process psd files in Python
A function that measures the processing time of a method in python
How to create a large amount of test data in MySQL? ??
Display a list of alphabets in Python 3
Generate Japanese test data with Python faker
Introducing a library that was not included in pip on Python / Windows
[Python] I wrote a simple code that automatically generates AA (ASCII art)
Test & Debug Tips: Create a file of the specified size in Python
A summary of Python e-books that are useful for free-to-read data analysis
[DSU Edition] AtCoder Library reading with a green coder ~ Implementation in Python ~
Created Simple SQLite, a Python library that simplifies SQLite table creation / data insertion
Let's create a customer database that automatically issues a QR code in Python
Introduction of "scikit-mobility", a library that allows you to easily analyze human flow data with Python (Part 1)
Create a Vim + Python test environment in 1 minute
Draw a graph of a quadratic function in Python
Create test data like that with Python (Part 1)
A memo that I wrote a quicksort in Python
Get the caller of a function in Python
I registered PyQCheck, a library that can perform QuickCheck with Python, in PyPI.
A program that removes duplicate statements in Python
Make a copy of the list in Python
Real-time visualization of thermography AMG8833 data in Python
Set up a test SMTP server in Python.
Rewriting elements in a loop of lists (Python)
Data analysis in Python: A note about line_profiler
Make a joyplot-like plot of R in python
Output in the form of a python array
Get a glimpse of machine learning in Python
Basic data frame operations written by beginners in a week of learning Python
A memo to generate a dynamic variable of class from dictionary data (dict) that has only standard type data in Python3
A python script that generates a sample dataset for checking the operation of a classification tree
Summary of statistical data analysis methods using Python that can be used in business
A note on the library implementation that explores hyperparameters using Bayesian optimization in Python
[Python] A program that finds the shortest number of steps in a game that crosses clouds
A memo that implements the job of loading a GCS file into BigQuery in Python
Simple code that gives a score of 0.81339 in Kaggle's Titanic: Machine Learning from Disaster
A program that summarizes the transaction history csv data of SBI SECURITIES stocks [Python3]
Create a data collection bot in Python using Selenium
Reading CSV data from DSX object storage Python code
[Python] A program that counts the number of valleys
Summary of tools needed to analyze data in Python
Full-width and half-width processing of CSV data in Python
Power BI visualization of Salesforce data entirely in Python
Receive dictionary data from a Python program in AppleScript
Let's write python code that parses go code and generates go code
A memorandum when writing experimental code ~ Logging in python
# Function that returns the character code of a string
Ruby, Python code fragment execution of selection in Emacs
One liner that outputs 1000000 digits of pi in Python
A quick comparison of Python and node.js test libraries
What's in that variable (when running a Python script)