[PYTHON] How to create sample CSV data with hypothesis

hypothesis is a library that allows you to write unit tests more effectively than test cases.

It seems that you can try a wide range of values used for testing. However, this time, I would like to see if you can easily create sample data using the data generation function of hypothesis.

Task

I want to generate data within a fixed range or limit of file format (CSV).

I'll try!

First, I want to define the data. The data for which you want to create hypothesis is [strategy](https://hypothesis.readthedocs.io/en/latest/data.html#core -strategies) is used for definition.

This time, since the standard library csv of python is used, if you create a Dict for DictWriter, you can easily write it out, so when creating a dict, [fixed_dictionaries](https://hypothesis.readthedocs.io/en/ There is a strategy called latest / data.html # hypothesis.strategies.fixed_dictionaries), so this seems to be possible this way.

Among them, you can enter the key of the dict you want to create, decide the strategy to create the value, and define it!

from hypothesis import strategies as st

DictRowDataModel = st.fixed_dictionaries({
    'k_id': st.none(),
    'w_id': st.none(),
    'Item 1': st.integers(min_value=1, max_value=7),
    'Item 2': st.integers(min_value=1, max_value=5),
    'Item 3': st.integers(min_value=1, max_value=16)
})

The next thing that was difficult to understand was how to use this to generate data. It seems that there is no example of this use because it seems to be used in unit tests normally.

Example of using test case:

from hypothesis import given
import hypothesis.strategies as st

@given(st.integers(), st.integers())
def test_ints_are_commutative(x, y):
    assert x + y == y + x

But when I look for it, it seems that strategy has a method of ʻexample ()` that can be used:


import csv
from hypothesis import strategies as st

d = {
    'k_id': st.none(),
    'w_id': st.none(),
    'Item 1': st.integers(min_value=1, max_value=7),
    'Item 2': st.integers(min_value=1, max_value=5),
    'Item 3': st.integers(min_value=1, max_value=16)
}

DictRowDataModel = st.fixed_dictionaries(d)

samples = 3
with open('sample.csv', 'w', encoding='utf8') as out:
    writer = csv.DictWriter(out, fieldnames=tuple(d.keys()))
    for i in range(samples):
        sample = DictRowDataModel.example()
        writer.writerow(sample)
         

I didn't have to write the code for range generation. happy.

Conclusion

Using .example () of strategy made it easy to create CSV data ~: tada:

This Warning will be issued, but it is created for the time being because it is for precautions such as test speed. Ignore for now:

NonInteractiveExampleWarning: The `.example()` method is good for exploring strategies, but should only be used interactively.  We recommend using `@given` for tests - it performs better, saves and replays failures to avoid flakiness, and reports minimal examples. (strategy: fixed_dictionaries(...),

Recommended Posts

How to create sample CSV data with hypothesis
How to quickly create array sample data during coding
How to deal with imbalanced data
How to deal with imbalanced data
How to Data Augmentation with PyTorch
How to use CUT command (with sample)
How to read problem data with paiza
How to read a CSV file with Python 2/3
Randomly sample MNIST data to create a dataset
How to scrape horse racing data with BeautifulSoup
How to store CSV data in Amazon Kinesis Streams with standard input
How to create data to put in CNN (Chainer)
I tried to create CSV upload, data processing, download function with Django
How to create a multi-platform app with kivy
Write CSV data to AWS-S3 with AWS-Lambda + Python
How to create random numbers with NumPy's random module
Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]
How to use xgboost: Multi-class classification with iris data
How to scrape image data from flickr with python
How to create a submenu with the [Blender] plugin
How to convert horizontally held data to vertically held data with pandas
How to get more than 1000 data with SQLAlchemy + MySQLdb
How to extract non-missing value nan data with pandas
How to output CSV of multi-line header with pandas
How to convert JSON file to CSV file with Python Pandas
[Python] How to create a 2D histogram with Matplotlib
How to extract non-missing value nan data with pandas
Sample data created with python
Extract Twitter data with CSV
How to update with SQLAlchemy?
How to cast with Theano
How to Alter with SQLAlchemy?
Write to csv with Python
How to separate strings with','
How to RDP with Fedora31
How to handle data frames
How to Delete with SQLAlchemy?
How to use fixture in Django to populate sample data associated with a user model
How to create a flow mesh around a cylinder with snappyHexMesh
[Python Kivy] How to create an exe file with pyinstaller
How to create dataframes and mess with elements in pandas
[Introduction to Python] How to get data with the listdir function
[Python / Ruby] Understanding with code How to get data from online and write it to CSV
How to cancel RT with tweepy
[Python] How to FFT mp3 data
Python: How to use async with
How to read e-Stat subregion data
[Python] Write to csv file with Python
Create folders from '01' to '12' with python
Output to csv file with Python
How to use virtualenv with PowerShell
How to install python-pip with ubuntu20.04LTS
How to create a Conda package
How to create your own Transform
How to create an email user
How to create a virtual bridge
How to create / delete symbolic links
How to get started with Scrapy
How to get started with Python
How to deal with DistributionNotFound errors
How to get started with Django