Effective Python Learning Memorandum Day 8 [8/100]

Introduction

The other day I learned about 100 Days Of Code, which was popular on Twitter for a while. The purpose of this article is to keep a record and output how much I, as a beginner, can grow through 100 days of study. I think there are many mistakes and difficult to read. I would appreciate it if you could point out!

Teaching materials to be learned this time

Today's progress

--Progress: Pages 64-69 --Chapter 3: Classes and Inheritance ――I will write down what I often forget or didn't know about what I learned today.

@classmethod Build objects generically using polymorphism

--Polymorphism is one of the methods in which multiple classes in a certain hierarchy implement each version of a certain method.

For example, suppose you're writing an implementation of MapReduce and want a common class that represents your input data. Define a common class with a read method that needs to be defined in a subclass as follows: Here, MapReduce is a program for distributed processing of a large amount of divided data in a cluster.

class InputData(object):
    def read(self):
        raise NotImplementedError

Inherit the InputData class and define subclasses

class PathInputData(InputData):
    def __init__(self, path):
        super().__init__()       #Is of InputData__init__()Should run, but in InputData__init__I don't have it, but is it necessary?...?
        self.path = path
    
    def read(self):
        return open(self.path).read()

Next, define a MapReduce Worker that uses the input data.

class Worker(object):
    def __init_(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self, other):
        raise NotImplementedError

Next, we inherit the Worker class and define a line feed counter.

class LineCountWorker(Worker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')

    def reduce(self, other):
        self.result += other.result

Consider how to integrate the classes you have defined so far. The first is to use helper functions to build objects and work together manually.

import os
from threading import Thread

#List the contents of the directory and generate a PathInputData instance for each file it contains
def generate_inputs(data_dir):
    for name in os.listdir(data_dir):
        yield PathInputData(os.path.join(data_dir, name))

# generate_Create a LineCountWorker instance using the inputData instance returned by inputs
def create_workers(input_list):
    workers = []
    for input_data in input_list:
        workers.append(LineCountWorker(input_data))
    return workers

def execute(workers):
    threads = [Thread(target=w.map) for w in workers]
    for thread in threads: thread.start()
    for thread in threads: thread.join()

    first, rest = workers[0], workers[1:]
    for worker in rest:
        first.reduce(worker)
        return first.result

#Finally, put them together into a function that performs each step
def mapreduce(data_dir):
    inputs = generate_inputs(data_dir)
    workers = create_workers(inputs)
    return execute(workers)

The problem with this integration method is that if you write other subclasses such as InputData or Worker, you will have to rewrite generate_inputs and create_workers so that the mapreduce function will support them. The way to solve this problem is to use @classmethod. This applies to the entire class, not to Kochi objects. The code that applies this method to the MapReduce class is as follows.

import os
from threading import Thread

class InputData(object):
    def read(self):
        raise NotImplementedError

class PathInputData(InputData):
    def __init__(self, path):
        super().__init__()
        self.path = path
    
    def read(self):
        return open(self.path).read()

class Worker(object):
    def __init_(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self, other):
        raise NotImplementedError

class LineCountWorker(Worker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')

    def reduce(self, other):
        self.result += other.result

def generate_inputs(data_dir):
    for name in os.listdir(data_dir):
        yield PathInputData(os.path.join(data_dir, name))

def create_workers(input_list):
    workers = []
    for input_data in input_list:
        workers.append(LineCountWorker(input_data))
    return workers

def execute(workers):
    threads = [Thread(target=w.map) for w in workers]
    for thread in threads: thread.start()
    for thread in threads: thread.join()

    first, rest = workers[0], workers[1:]
    for worker in rest:
        first.reduce(worker)
        return first.result

class GenericInputData(object):
    def rad(self):
        raise NotImplementedError

    @classmethod
    def generate_inputs(cls, config):
        raise NotImplementedError

class PathInputData(GenericInputData):
    def __init__(self, path):
        super().__init__()
        self.path = path

    def read(self):
        with open(self.path) as f:
            return f.read()

    @classmethod
    def generate_inputs(cls, config):
        data_dir = config['data_dir']
        for name in os.listdir(data_dir):
            yield cls(os.path.join(data_dir, name))

class GenericWorker:
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self, other):
        raise NotImplementedError

    @classmethod
    def create_workers(cls, input_class, config):
        workers = []
        for input_data in input_class.generate_inputs(config):
            workers.append(cls(input_data))
        return workers

class LineCountWorker(GenericWorker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')

    def reduce(self, other):
        self.result += other.result      

def mapreduce(worker_class, input_class, config):
    workers = worker_class.create_workers(input_class, config)
    return execute(workers)

This code outputs the same result as the previous implementation. This way of writing eliminates the need to rewrite the relevant code when you change the GenericInputData or GenericWorker subclass.

Recommended Posts

Effective Python Learning Memorandum Day 15 [15/100]
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 12 [12/100]
Effective Python Learning Memorandum Day 8 [8/100]
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Effective Python Learning Memorandum Day 13 [13/100]
Effective Python Learning Memorandum Day 3 [3/100]
Effective Python Learning Memorandum Day 5 [5/100]
Effective Python Learning Memorandum Day 4 [4/100]
Effective Python Learning Memorandum Day 7 [7/100]
Effective Python Learning Memorandum Day 2 [2/100]
Python learning day 4
Python memorandum
Python Memorandum 2
Python memorandum
python learning
python memorandum
python memorandum
Python day 1
Python memorandum
python memorandum
Python memorandum
Python basics memorandum
[Python] Learning Note 1
Python learning notes
Python memorandum (algorithm)
python learning output
Deep Learning Memorandum
Python learning site
Python Deep Learning
Python learning (supplement)
Deep learning × Python
Python memorandum [links]
python learning notes
Python study day 1
Machine learning starting with Python Personal memorandum Part2
Machine learning starting with Python Personal memorandum Part1
Python memorandum numbering variables
Python class (Python learning memo ⑦)
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Python module (Python learning memo ④)
Reinforcement learning 1 Python installation
Learning Python with ChemTHEATER 05-1
Python: Deep Learning Practices
python memorandum (sequential update)
Python: Unsupervised Learning: Basics
Learning record 4 (8th day)
Learning record 9 (13th day)
[1day1lang AdventCalender] day4 Python
Learning record 3 (7th day)
Learning record 5 (9th day)
Learning record 6 (10th day)
Python memorandum (personal bookmark)
Programming learning record day 2
Learning record 8 (12th day)
Learning record 1 (4th day)
Learning record 7 (11th day)
Private Python learning procedure
Learning Python with ChemTHEATER 02