Create a new Python numerical calculation project

This article is the 18th day of Next Co., Ltd. (Lifull) Advent Calendar 2016.

Hello, this is Ninomiya of digital marketing U.

Recently, several departments within the company have created projects that use Python for statistical processing and numerical calculations.

Until now, R language was used mainly by @wakuteka of the same group, and documents and know-how were organized as such.

It seems that there was also a need.

(Of course, instead of degrading the R language, it is interactive using a library group called tidyverse. The R language is easier to use for analysis and visualization. The tools are used.)

I use Python as a hobby and have had the opportunity to work on several projects in the form of reviews and advice (which is exaggerated).

I would like to summarize the knowledge gained in that process and the articles that I referred to.

However, since we proceeded through trial and error, there may be a better way, and the content of the article does not cover the entire development. If you find such a point, I would be grateful if you could let me know in the comments.

Prepare a Python environment

The article "Development environment created with the intention of writing Python seriously" was helpful.

A well-known Python distribution called Anaconda, which also has a library for statistics, is not used in production. (I use it in the development / analysis environment.)

As you can see in the article "Flowchart whether pyenv is needed"

I set Anaconda's / bin to the path, but the tools that Anaconda has (openssl / curl / python) obscure the tools that the OS has. Also, it is too premised on BASH, and if you use zsh, it will not work unless you fix it in various ways.

This is because I was worried about the behavior around here in actual operation.

development of

I needed to review it, so I re-learned the coding standards and how to write DocStrings.

Cut out as a package (if necessary)

Department-specific numerical calculations etc. can be installed using pip using the git repository for groups.

At that time, I referred to these articles.

Coding convention

Refer to PEP8 and Google Python Style Guide I think it will be.

However, it is difficult to visually check the coding standard by hand, so flake8 and autopep8 I also use .org / pypi / autopep8) as appropriate. However, PEP8 is a relatively strict coding standard, so we are proceeding with consultation as appropriate.

Here, I referred to the following article.

Description of DocStrings

I also have Google-style DocStrings written to make it easier to understand the input and output of functions and methods. There seems to be other Numpy styles as well.

When using Type Annotation with Python3.5 or later,

def function_with_pep484_type_annotations(param1: int, param2: str) -> bool:
    """Example function with PEP 484 type annotations.
 
    Args:
        param1: The first parameter.
        param2: The second parameter.
 
    Returns:
        The return value. True for success, False otherwise.
  
    """

This is the case when type annotation is not used.

def function_with_types_in_docstring(param1, param2):
    """Example function with types documented in the docstring.
 
    Args:
        param1 (int): The first parameter.
        param2 (str): The second parameter.
 
    Returns:
        bool: The return value. True for success, False otherwise.
 
    """

However, in the code I reviewed, there was a function that returned multiple values in tuple, but (as far as I investigated) Google style DocStrings did not seem to support the writing method that returns multiple values in Returns. .. Based on this stack overflow answer, write as follows I got it.

import pandas as pd

def _postprocess_data(output_data, market):
    """Format into data for alert and file output

        Args:
            output_data (pd.DataFrame):Data frame after calculation
            market (str):Real estate market name

        Returns:
            tuple:Returns the following values as multiple values
                - output_data (pd.DataFrame):Output data
                - monthly_data (pd.DataFrame):Monthly data

    """

I haven't tried type annotations and Static analysis using mypy yet, but I'll take the opportunity to try it.

Writing test code

It was a small project, so I wrote some simple tests with unittest to some extent.

Besides unittest, it seems that there are some frameworks, so I will create an opportunity for this as well.

Right now, I'm working on something like "Paste a function that I tried and errored with jupyter notebook into an editor." I want to be able to use it properly with TDD (like thing) as needed.

About pandas dataframes

When doing data analysis in Python, I think that you will use pandas to introduce a data frame type like R language.

In R language, libraries such as dplyr and tidyr can express the flow of data processing concisely using pipeline operators, but it seems that it takes some getting used to doing the same with pandas. (Also, unlike R, which expresses everything in a data frame, it is tried and errored to use it properly with the dictionary type.)

However, this article has a good way to write pandas, so please read it if you are starting to use it.

Summary

Here's a quick summary of the results of trial and error (or in the process of doing) in a Python project using a data analysis library. I hope it helps someone reading this.

The content of the article does not cover the entire development. If you find such a point, I would be grateful if you could let me know in the comments.

Also, please continue to pay attention to Our Advent Calendar.

Recommended Posts

Create a new Python numerical calculation project
Create a Python module
Create a Python environment
Numerical calculation with Python
Create a New Todoist Task from Python Script
Create a new page in confluence with Python
Create a Wox plugin (Python)
Create a function in Python
Create a dictionary in Python
Create a python numpy array
Tasks at the start of a new python project
Create a directory with python
Create a python GUI using tkinter
Create a DI Container in Python
Steps to create a Django project
Create a Python environment on Mac (2017/4)
Create new application use python, django
Create a virtual environment with Python!
Create a binary file in Python
Create a python environment on centos
Create Python project documentation in Sphinx
Create a Python general-purpose decorator framework
Create a Kubernetes Operator in Python
5 Ways to Create a Python Chatbot
Create a random string in Python
Create a Python function decorator with Class
[Scientific / technical calculation by Python] Sum calculation, numerical calculation
Build a blockchain with Python ① Create a class
Create a dummy image with Python + PIL.
Create a python environment on your Mac
Create a simple GUI app in Python
Let's create a virtual environment for Python
[Python] Create a virtual environment with Anaconda
Let's create a free group with Python
Commands for creating a new django project
Create a JSON object mapper in Python
[Python] Create a Batch environment using AWS-CDK
[Python] [LINE Bot] Create a parrot return LINE Bot
Create a new dict that combines dicts
Create a word frequency counter with Python 3.4
Create a deb file from a python package
[Python] Create a LineBot that runs regularly
[GPS] Create a kml file in Python
Let's make a combination calculation in Python
[Cloudian # 3] Try to create a new object storage bucket with Python (boto3)
Create a frame with transparent background with tkinter [Python]
[Python] List Comprehension Various ways to create a list
Edit Excel from Python to create a PivotTable
Create a Vim + Python test environment in 1 minute
Create a GIF file using Pillow in Python
How to create a Python virtual environment (venv)
I want to create a nice Python development environment for my new Mac
Create a C array from a Python> Excel sheet
[python] Create a list of various character types
Create a LINE BOT with Minette for Python
I want to create a window in Python
Create a standard normal distribution graph in Python
[Scientific / technical calculation by Python] Lagrange interpolation, numerical calculation
How to create a JSON file in Python
Create a virtual environment with conda in Python
Create a page that loads infinitely with python