[PYTHON] Create machine learning projects at explosive speed using templates

Are you creating a machine learning project with a different structure each time? Are you spending time worrying about what to place and where?

Good news for such people. Machine learning projects can be created with ** one command **. You can create a project like the one below in seconds.

** Directory structure **

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.testrun.org

There are several benefits to using such a standard project structure. New people can join quickly because they know what's where. Also, for me, when I review the project a few months later, I don't have to worry about where and what is, so I can work quickly.

This kind of project structure is as easy to create as when creating a Django or Rails project. This article will show you how to do that.

Getting started First of all, install ** Cookiecutter **, which is a library for creating a directory structure. After that, I will actually make a project.

Install Cookiecutter

First, let's talk about Cookiecutter.

Cookiecutter is ** a Python library for creating projects from project templates **. You can easily create a project from an existing project template by using Cookiecutter. This time we will use a template for machine learning, but you can choose the template according to the project you want to create.

You can install Cookiecutter using pip as follows:

$ pip install cookiecutter

After the installation is complete, let's actually create a project.

Creating a project

Hit the command using the Cookiecutter you installed to create a new project. At that time, it is necessary to specify an existing project template as an argument of the command. This time, specify Cookiecutter Data Science, which is a template for machine learning. Let's execute the following command.

$ cookiecutter https://github.com/drivendata/cookiecutter-data-science

After executing the above command, you will be asked for the project name and creator's name, so I will answer. Answer all the questions and you have a new project.

project_name [project_name]: machine-learning
repo_name [machine-learning]: 
author_name [Your name (or your organization/company/team)]: Hironsan
description [A short description of the project.]: Machine learning project
Select open_source_license:
1 - MIT
2 - BSD
3 - Not open source
Choose from 1, 2, 3 [1]: 1
s3_bucket [[OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')]: 
Select python_interpreter:
1 - python
2 - python3
Choose from 1, 2 [1]: 2

in conclusion

Determining the project structure is a surprisingly time-consuming task. I hope this article will help you in creating your project.

I'm also tweeting information about machine learning and natural language processing in my account, so I'm looking forward to your follow-up. @Hironsan

References

Recommended Posts

Create machine learning projects at explosive speed using templates
Implement APIs at explosive speed using Django REST Framework
Try to solve Sudoku at explosive speed using numpy
Build a web API server at explosive speed using hug
Try multivariable correlation analysis using Graphical lasso at explosive speed
Application development using Azure Machine Learning
Python beginners publish web applications using machine learning [Part 2] Introduction to explosive Python !!
Stock price forecast using machine learning (scikit-learn)
[Machine learning] LDA topic classification using scikit-learn
[Machine learning] FX prediction using decision trees
[TPU] [Transformers] Make BERT at explosive speed
[Machine learning] Supervised learning using kernel density estimation
Stock price forecast using machine learning (regression)
[Machine learning] Regression analysis using scikit learn
Machine learning
How to create large files at high speed
Data supply tricks using deques in machine learning
Python template for log analysis at explosive speed
I tried using PyCaret at the fastest speed
[Machine learning] Supervised learning using kernel density estimation Part 2
[Machine learning] Supervised learning using kernel density estimation Part 3
Face image dataset sorting using machine learning model (# 3)
[Python3] Let's analyze data using machine learning! (Regression)
Try using Jupyter Notebook of Azure Machine Learning
[Machine learning] Extract similar words mechanically using WordNet
Causal reasoning using machine learning (organization of causal reasoning methods)
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS