Detailed Python techniques required for data shaping (1)

Format the data for analysis

In reality, people's attention and interest in data preparation, calculation, and visualization is about 1: 4: 5, while the ratio in practice is about 6: 2: 2. This is just an experience, but is it safe to say that most of the analysis is a preparation for setting up the data in a computable state? That's why today I'll write a series of techniques that I often use in Python, which emphasizes usability as a glue language.

Loading and unloading JSON objects

The content isn't particularly new, but JSON format conversion is a common pre-process. I will leave it as a memo so that I will not forget it.

In Python, JSON-formatted data structures can be handled by import json. When loaded, it becomes a dictionary (hash in other languages) data format, which can also be in JSON format.

Loading JSON data from a CSV file

Assume a CSV file separated by key and value. Suppose that the value part contains text data in JSON format serialized by another system. To load this, you can split the string by specifying the delimiter as follows, and load the JSON as a dictionary type object with the json.load method.

import json

file = open(self.filename, 'r')
for line in file:
    key, value = line.rstrip().split(",")
    dic = json.loads(value)

Write a JSON object to a file

On the other hand, when writing a dictionary type object to a file or standard output in JSON format, use json.dumps as follows.

import json

json_obj = json.dumps(dic)
print(json_obj)

It's easy.

Use of arguments and instance variables

The arguments passed to the Python script are stored in sys.argv.

import sys

if __name__ == '__main__':
    argsmin = 1
    if len(sys.argv) > argsmin:
        some_instance = SomeClass(sys.argv)
        some_instance.some_method()

If you pass it when initializing the instance, you can use it by storing the argument in the instance variable.

class SomeClass:
    def __init__(self, args):
        self.filename = args[1]

Private method

Python customarily assumes that a method is a private method by prefixing it with a _.

    def self._some_method(self):
        ...

Sure, you can't call it some_instance.some_method. But you can actually call it explicitly with some_instance._some_method. It's habitual and not functionally private.

Use two __ to prevent it from being called functionally.

    def self.__some_method(self):
        ...

However, even with this, there is a tricky way to call it, and the testability of the private method will be reduced, so I do not recommend it very much. The fact that you can call a private method, in other words, makes it easier when testing.

Testing framework

There are many different Python testing frameworks, but nose is relatively easy to use. First, consider a method that behaves as follows.

import factorial
factorial.factorial(10)
#=> 3628800

Code to be tested

Let's say this Python code is named factorial.py and the method implementation looks like this.

def factorial(n):
    if n==1:
        return 1
    else:
        return n * factorial(n-1)

Test code

To test this, create a file named test_factorial.py and write the test code as follows:

from nose.tools import * #Loading the testing framework
from factorial import *  #Loading code under test

def test_factorial(): #Factorial method test
    i=10
    e=3628800
    eq_(e,factorial(i)) #Verification

The above is equivalent to doing eq_ (3628800, factorial (10)) The eq_ method verifies that the values are equal.

Run the test

After implementing the test code, issue the nosetests command from the shell.

$ nosetests
.
----------------------------------------------------------------------
Ran 1 test in 0.008s

OK

Summary

It's easy to spend a lot of time preparing steady data. This time, I have summarized the techniques often used in such work as a memorandum.

Recommended Posts

Detailed Python techniques required for data shaping (1)
Detailed Python techniques required for data shaping (2)
Python course for data science_useful techniques
Python for Data Analysis Chapter 4
Techniques for sorting in Python
Python for Data Analysis Chapter 2
Python for Data Analysis Chapter 3
Preprocessing template for data analysis (Python)
Data formatting for Python / color plots
Python visualization tool for data analysis work
Summary of useful techniques for Python Scrapy
How to use "deque" for Python data
[Competitive programming] [Python3] Required knowledge, for myself
Memo # 4 for Python beginners to read "Detailed Python Grammar"
python [for myself]
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
Data analysis python
Memo # 3 for Python beginners to read "Detailed Python Grammar"
Memo # 1 for Python beginners to read "Detailed Python Grammar"
Display candlesticks for FX (forex) data in Python
Memo # 2 for Python beginners to read "Detailed Python Grammar"
Memo # 7 for Python beginners to read "Detailed Python Grammar"
Memo # 6 for Python beginners to read "Detailed Python Grammar"
Memo # 5 for Python beginners to read "Detailed Python Grammar"
[python] Read data
Let's analyze Covid-19 (Corona) data using Python [For beginners]
Use data class for data storage of Python 3.7 or higher
Data analysis for improving POG 1 ~ Web scraping with Python ~
Create your own Big Data in Python for validation
Dimensionality reduction and 2D plotting techniques for high-dimensional data
[For beginners] How to study Python3 data analysis exam
List of Python libraries for data scientists and data engineers
Knowledge and study methods required for future data analysts
[Python] Measures and displays the time required for processing
[CovsirPhy] COVID-19 Python package for data analysis: SIR-F model
Python Exercise for Beginners # 1 [Basic Data Types / If Statements]
[CovsirPhy] COVID-19 Python package for data analysis: S-R trend analysis
[CovsirPhy] COVID-19 Python Package for Data Analysis: SIR model
[CovsirPhy] COVID-19 Python Package for Data Analysis: Parameter estimation
Data analysis with python 2
Python basics ② for statement
Data analysis using Python 0
Data analysis overview python
About Python, for ~ (range)
Techniques for code testing?
Data cleaning using Python
python textbook for beginners
python for android Toolchain
Python data analysis template
[Python tutorial] Data structure
[Python] Sorting Numpy data
Data analysis with Python
OpenCV for Python beginners
Install Python (for Windows)
[Python] for statement error
Python environment for projects
[CovsirPhy] COVID-19 Python Package for Data Analysis: Scenario Analysis (Parameter Comparison)
[Understand in the shortest time] Python basics for data analysis
Which should I study, R or Python, for data analysis?
Stylish technique for pasting CSV data into Excel with Python
Python code for writing CSV data to DSX object storage