[PYTHON] EP 24 Use `@ classmethod` Polymorphism to Construct Objects Generically

  • Python only supports a single constructor per class, the __init__ method

Effective Python

Write MapReduce

Bad example. Though make base class for InputData and Woker, code are sticked to each other in create_worker and generate_inputs. If you want to use other class for mapreduce, you have to rewrite your code, which is not ideal.

import os
from threading import Thread

class InputData:
    def read(self):
        raise NotImplementedError

class PathInputData(InputData):
    def __init__(self, path):
        super().__init__()
        self.path = path

    def read(self):
        return open(self.path).read()

class Worker:
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self):
        raise NotImplementedError

class LineCountWoker(Worker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')

    def reduce(self, other):
        self.result += other.result

def generate_inputs(data_dir):
    for name in os.listdir(data_dir):
        yield PathInputData(os.path.join(data_dir, name))

def create_worker(input_list):
    workers = []
    for input_data in input_list:
        workers.append(LineCountWoker(input_data))
    return workers


def execute(workers):
    threads = [Thread(target=w.map) for w in workers]
    for thread in threads: thread.start()
    for thread in threads: thread.join()

    first, rest = workers[0], workers[1]
    for workers in rest:
        first.reduce(worker)
    return first.result

def mapreduce(data_dir):
    inputs = generate_inputs(data_dir)
    workers = create_worker(inputs)
    return execute(workers)


with TemporaryDirectory() as tmpdir:
    write_test_files(tmpdir)
    result = mapreduce(tmpdir)


Good practice

The main difference here is using @classmethod for common constructor interface. It looks like factory pattern. A mapreduce function accepts classes and instantiate them by calling @classmethod, which enable modulable and untidy code.

import os
from threading import Thread

class GenericInputData:
    def read(self):
        raise NotImplementedError

    @classmethod
    def generate_inputs(cls, config):
        raise NotImplementedError

class PathInputData(GenericInputData):
    def __init__(self, path):
        super().__init__()
        self.path = path

    def read(self):
        return open(self.path).read()

    @classmethod
    def generate_inputs(cls, config):
        data_dir = config['data_dir']
        for name in os.listdir(data_dir):
            yield cls(os.path.join(data_dir, name))

class GenericWorker:
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self):
        raise NotImplementedError

    @classmethod
    def create_workers(cls, input_class, config):
        workers = []
        for input_data in input_class.generate_inputs(config):
            workers.append(cls(input_data))
        return workers

class LineCountWoker(GenericWorker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count('\n')

    def reduce(self, other):
        self.result += other.result

def mapreduce(worker_class, input_class, config):
    workers = worker_class.create_workers(input_class, config)
    return execute(workers)

def execute(workers):
    threads = [Thread(target=w.map) for w in workers]
    for thread in threads: thread.start()
    for thread in threads: thread.join()

    first, rest = workers[0], workers[1]
    for workers in rest:
        first.reduce(worker)
    return first.result


with TemporaryDirectory() as tmpdir:
    write_test_files(tmpdir)
    config = {'data_dir': tmpdir}
    result = mapreduce(LineCountWoker, PathInputData, config)


Polymorphism

http://stackoverflow.com/questions/1031273/what-is-polymorphism-what-is-it-for-and-how-is-it-used

If you think about the Greek roots of the term, it should become obvious.

Poly = many: polygon = many-sided, polystyrene = many styrenes (a), polyglot = many languages, and so on. Morph = change or form: morphology = study of biological form, Morpheus = the Greek god of dreams able to take any form. So polymorphism is the ability (in programming) to present the same interface for differing underlying forms (data types).

For example, integers and floats are implicitly polymorphic since you can add, subtract, multiply and so on, irrespective of the fact that the types are different. They're rarely considered as objects in the usual term.

But, in that same way, a class like BigDecimal or Rational or Imaginary can also provide those operations, even though they operate on different data types.

The classic example is the Shape class and all the classes that can inherit from it (square, circle, dodecahedron, irregular polygon, splat and so on).

With polymorphism, each of these classes will have different underlying data. A point shape needs only two co-ordinates (assuming it's in a two-dimensional space of course). A circle needs a center and radius. A square or rectangle needs two co-ordinates for the top left and bottom right corners (and possibly) a rotation. An irregular polygon needs a series of lines.

And, by making the class responsible for its code as well as its data, you can achieve polymorphism. In this example, every class would have its own Draw() function and the client code could simply do:

shape.Draw()

to get the correct behavior for any shape.

This is in contrast to the old way of doing things in which the code was separate from the data, and you would have had functions such as drawSquare() and drawCircle().

Object orientation, polymorphism and inheritance are all closely-related concepts and they're vital to know. There have been many "silver bullets" during my long career which basically just fizzled out but the OO paradigm has turned out to be a good one. Learn it, understand it, love it - you'll be glad you did :-)

(a) I originally wrote that as a joke but it turned out to be correct and, therefore, not that funny. The momomer styrene happens to be made from carbon and hydrogen, C8H8, and polystyrene is made from groups of that, (C8H8)n.

Perhaps I should have stated that a polyp was many occurrences of the letter p although, now that I've had to explain the joke, even that doesn't seem funny either.

Recommended Posts

EP 24 Use `@ classmethod` Polymorphism to Construct Objects Generically
EP 11 Use `zip` to Process Iterators in Parallel