Introduction

・ This is my first post, so it may be difficult to see. please forgive me.

・ There may be some mistakes compared to the strict genetic algorithm. (Well, learning something? Forgive me because it seems like something is going well ...)

-I'm sorry if the source code is difficult to read or redundant because I haven't shown it to others. (I would be grateful if you could tell me the parts that were difficult to read and the parts that were useless.)

What is a genetic algorithm?

There are many kinds of wisdom of excellent Google teachers. https://www.sist.ac.jp/~suganuma/kougi/other_lecture/SE/opt/GA/GA.htm There are also definitions of words and this area, so please refer to them.

https://www.youtube.com/watch?v=w1MF0Iz0p40 Youtube is still amazing. You can easily see such an interesting and easy-to-understand thing.

The problem I want to solve this time

For the time being, as a sample problem, let's find the coordinates where the distance from the origin is 0 in the three-dimensional space (x, y, z). Obviously, (0,0,0) should be the answer ... After that, try changing the distance from the origin or making it about 20 dimensions.

Program flow

Since I recently studied the class, I tried to implement the general flow of GA in a class. To be honest, the advantages and disadvantages of putting them together in a class didn't come to my mind ...

For the time being, I will explain the general role of the functions in the program first.

`Function type`


class GA:
    #Get the parameters on GA from the Setting received as an argument (number of generations, etc.)
    def __init__(self, Setting):

    #Display parameters on GA held in class
    def get_parameter(self, flag=0, out_path="./"):

    #Most of the GA processing is written in this(Something like the main function)
    def Start_GA(self):

    #Randomly create genes for each individual as an initial population
    def make_genes(self):

    #A function that evaluates genes (calculates fitness), and at the same time, sorts in descending order of evaluation and retains data
    def evaluate(self, genes, goal):

    #Output genes to an external file along with fitness
    def Out_data(self, genes, evaluated_data, generation):

    #Save high-ranking individuals with high fitness
    def Save_elite(self, genes, evaluated_data):

    #A function that selects parents to have children of the next generation.
    def select(self, method, genes, evaluated_data):

    #A function that multiplies (crosses) selected parents to create a new individual
    def crossover(self, method, genes, Select_id):

    #The greater the stagnation, the more randomness is added to expand the solution search area (mutation).
    def mutation(self, genes, Stagnation):

In this, other functions are incorporated in the StartGA function, so if you execute this, the program will run.

Actual program

`GA.py`


import numpy as np
import random
import os
import copy

class GA:
    # Setting=[Name, MaxGeneration, Population, EliteCount, MaxStagnationGeneration, GenesDimension ...]
    def __init__(self, Setting):
        self.Name = Setting[0]
        self.MaxGeneration = Setting[1]
        self.Population = Setting[2]
        self.EliteCount = Setting[3]
        self.MaxStagnationGeneration = Setting[4]
        self.GenesDimension = Setting[5]

        os.makedirs("./Out/", exist_ok=True)

        print("\n GA start!! \n")


        # flag==0, print output flag==1, file description
    def get_parameter(self, flag=0, out_path="./"):
        #Path for file output
        out_path = out_path + "GA_parameter.txt"
        #Output contents
        contents = [
                     f'GA_parameter!!\n',
                     f'Name: {self.Name}',
                     f'MaxGeneration: {self.MaxGeneration}',
                     f'Population: {self.Population}',
                     f'EliteCount: {self.EliteCount}',
                     f'GenesDimension: {self.GenesDimension}',
                                                                ]
        #Change the output destination with flag
        if flag==0:
            for sentense in contents:
                print(sentense)
        elif flag==1:
            with open(out_path, mode="w") as file:
                for sentense in contents:
                    print(sentense, file=file)


    #Function to start GA (or rather, it's like the main function because most of the processing is written in this?)
    def Start_GA(self):
        #Current generation
        generation = 0
        #Generation stagnation value
        Stagnation = 0
        #Target value
        goal = 0
        #Gene trajectory with the best fitness
        top_genes = list()
        #Genes are randomly created as an initial population.
        genes = self.make_genes()
        while(True):
            #Ends when you reach the maximum generation
            if generation >= self.MaxGeneration:
                break
            else:
                #Fitness assessment
                evaluated_data = self.evaluate(genes, goal)
                #Output gene file and fitness to the outside
                self.Out_data(genes, evaluated_data, generation)
                #Save high-ranking individuals with high fitness
                elite_genes = copy.deepcopy(self.Save_elite(genes, evaluated_data))
                #Select people with high fitness
                Select_id = self.select(0, genes, evaluated_data)
                #Crossing to create new genes
                genes = self.crossover(0, genes, Select_id)
                #Give a mutation to expand the solution search area
                genes = self.mutation(genes, Stagnation)
                #Add elite gene
                genes[len(genes):len(genes)] = elite_genes
                #Save the most fit individual
                top_genes.append(elite_genes[0])
                #After the second generation, if the fitness stagnation (the best value of fitness is not updated) exceeds the maximum number of stagnation, the program ends.
                if len(top_genes)>=2:
                    if top_genes[-1]==top_genes[-2]:
                        #The algorithm ends when the maximum number of stagnation is exceeded
                        if Stagnation >= self.MaxStagnationGeneration:
                            exit()
                        else:
                            Stagnation += 1
                    else:
                        Stagnation = 0
                #Advance generations
                generation +=1



    #Create genes randomly. Self for one individual.Expand the dimension by the number of Genes Dimension
    def make_genes(self):
        genes = list()
        tmp = list()
        for child in range(self.Population):
            for _ in range(self.GenesDimension):
                tmp.append(random.randint(-30,30))
            genes.append(tmp)
            tmp = list()
        # genes.shape=(self.Population, self.GenesDimension)
        return genes



    #Gene evaluation
    def evaluate(self, genes, goal):
        #The evaluated data is stored in a dictionary type.(key, value)=(child_id, fitness)
        evaluated_data = dict()
        for child in range(self.Population):
            #The higher the fitness, the better (adjusted to a maximum of 1).)
            fitness = 1/(self.eval_func(genes[child], goal) + 1)
            evaluated_data[child] = fitness
        #Descending sort according to evaluation value
        evaluated_data = dict(sorted(evaluated_data.items(),reverse=True, key=lambda x:x[1]))
        return evaluated_data

    #Auxiliary function of evaluate
    #Calculate the difference from the target value
    def eval_func(self, gene, goal):
        sum_value = 0
        # self.Calculate the Euclidean distance from the origin in GenesDimension space.
        for id in range(len(gene)):
            sum_value += gene[id]**2
        return abs(np.sqrt(sum_value) - goal)   # goal(Target value)Calculate the deviation from.


    #Output genes data and fitness to an external file
    def Out_data(self, genes, evaluated_data, generation):
        #Gene output(Before sorting),Same comma delimited as csv
        gene_path = "./Out/" + str(generation) + ".gene"
        with open(gene_path, "w") as file:
            for child in range(len(genes)):
                for id in range(len(genes[child])):
                    if id==len(genes[child])-1:
                        print(str(genes[child][id]), file=file)
                    else:
                        print(str(genes[child][id]), end=",", file=file)
        #Fitness output (after sorting),Same comma delimited as csv
        fitness_path = "./Out/" + str(generation) + ".fit"
        with open(fitness_path, "w") as file:
            for id, fitness in evaluated_data.items():
                print(str(id) +","+ str(fitness), file=file)


    #Save high-ranking individuals with high fitness
    def Save_elite(self, genes, evaluated_data):
        elite_id = list()
        elite_genes = list()
        # elite_Get id from dictionary
        for key in evaluated_data.keys():
            elite_id.append(key)
        #Extract from the top as many as the number of elite
        for elite in range(self.EliteCount):
            elite_genes.append(genes[elite_id[elite]])

        return elite_genes




    # method==0: Expected value selection
    # method==1: Ranking selection
    # method==2: Tournament selection
    def select(self, method, genes, evaluated_data):
        #Store id and fitness separately in a list
        id_list = list()
        fitness_list = list()
        Select_id = list()
        for id, fitness in evaluated_data.items():
            id_list.append(id)
            fitness_list.append(fitness)

        #Expected value selection
        if method==0:
            roulette = 360
            tmp_list = list()
            sum_fitness = sum(fitness_list) #Get the total value of fitness
            cumulative_fitness = np.cumsum(roulette*np.array(fitness_list)/sum_fitness) #Create a wide 360-degree table with one rotation according to the high fitness
            for child in range(self.Population - self.EliteCount):    #Population-Elite Population to create non-elite individuals
                for _ in range(2):                                    #Repeat to combine the two genes.
                    Fate_dice = random.uniform(0,360)                 # 0~Roll the dice of fate between 360
                    Hit_id = 0
                    while True:
                        if cumulative_fitness[Hit_id] >= Fate_dice:
                            break
                        else:
                            Hit_id += 1
                    tmp_list.append(id_list[Hit_id])
                Select_id.append(tmp_list)
                tmp_list = list()
        return Select_id



    # method==0: Uniform crossover
    # method==1: Multipoint crossover
    # method==2: Partial coincidence crossover
    def crossover(self, method, genes, Select_id):
        new_genes = list()
        #Select the next generation genes_id[child][0]Created using the id number of
        for child in range(self.Population - self.EliteCount):
            new_genes.append(genes[Select_id[child][0]])

        if method==0:
            probability = 0.4
            for child in range(self.Population - self.EliteCount):
                for dim in range(len(genes[0])):
                    Fate_random = random.uniform(0,1)
                    if Fate_random <= probability:
                        new_genes[child][dim] = genes[Select_id[child][1]][dim]
        return new_genes



    def mutation(self, genes, Stagnation):
        probability = 0.4                          #Probability of mutation
        serious_rate = Stagnation/(self.MaxGeneration - self.EliteCount)   #The greater the stagnation, the more likely it is to mutate.
        serious_count = int(len(genes)*serious_rate)        #By multiplying the size of the genes sequence, it is possible to adjust how much of the sequence is mutated.
        for child in range(serious_count):
            for dim in range(len(genes[0])):
                Fate_random = random.uniform(0,1)      # 0~Get a random number of 1
                if Fate_random <= probability:
                    genes[child][dim] += random.randint(-10,10) # -10~Give 10 random numbers (fixed)
        return genes



# Setting=[Name, MaxGeneration, Population, EliteCount, MaxStagnationGeneration, GenesDimension ...]
Setting = ["Sample", 150, 30, 2, 100, 3]
GA = GA(Setting)
GA.Start_GA()

In make_genes, genes are created in the range of -30 to 30 for one axis (x axis, etc.). Also, for mutations, values are added in the range of -10 to 10.

Analysis of results

Under the above conditions, I calculated 30 individuals per generation for about 150 generations. The program to see the result is as follows.

`analysis_log.py`



import pandas as pd
import numpy as np
import matplotlib.pyplot as plt



#Calculate distance from data frame
def calc_distance(dataframe):
    distance = 0
    for col in range(len(dataframe.columns)):   #Repeat as many times as there are columns
        distance += dataframe[col]**2
    distance = np.sqrt(distance) #Calculate Euclidean distance
    return distance

#Combine data frames
def merge_dataframe(origin, add):
    origin = pd.concat([origin, add], axis=1)
    return origin

#Create a graph as time series data
def plot_log(distance, MaxGeneration):
    y_mean = np.array(distance.mean())      #Get the average value of the data
    y_median = np.array(distance.median())  #Get median
    y_min = np.array(distance.min()) #Get the minimum value
    y_max = np.array(distance.max()) #Get maximum
    x = np.arange(len(y_mean))
    xicks = np.arange(0, MaxGeneration+1, MaxGeneration/10)   #To display Max Generation in memory as well+1
    plt.plot(x, y_mean, label="mean", color="green")
    plt.plot(x, y_median,label="median", color="blue")
    plt.plot(x, y_min, label="min", color="red")
    plt.plot(x, y_max, label="max", color="cyan")
    plt.xlabel("$Generation$", fontsize=15)
    plt.ylabel("$distance$", fontsize=15)
    plt.xlim(xmin=0)
    plt.ylim(ymin=0)
    plt.grid(True)
    plt.xticks(xicks)
    plt.legend()
    plt.savefig("analysis_gene.jpg ")



MaxGeneration = 150

for G in range(MaxGeneration):
    if G==0:
        gene_path = "./Out/" + str(G) + ".gene"
        dataframe_0 = pd.read_csv(gene_path, header=None)  #Reading data
        distance_0 = calc_distance(dataframe_0)            #Calculate distance from data frame
    else:
        gene_path = "./Out/" + str(G) + ".gene"
        dataframe = pd.read_csv(gene_path, header=None)
        distance = calc_distance(dataframe)
        distance_0 = merge_dataframe(distance_0, distance)

plot_log(distance_0, MaxGeneration)

The horizontal axis is the number of generations, and the vertical axis is the distance from the origin (0 is the target value this time). In addition, mean, median, min, and max are the average value, median value, minimum value, and maximum value in each generation, respectively. The search seems to be proceeding smoothly. I'm happy.

Finally, with the target value set to 100, 100 individuals per generation are calculated for about 500 generations. Then, the target value was returned to 0, the number of gene axes (number of dimensions) was set to 20, and the calculation was performed for 150 individuals and 2000 generations. analysis_gene_goal=100.jpg

[Impression] Somehow, I feel that it would be nice if the values were updated ... I wrote a lot about it, but if there is a fierce man who has read it to the end, I would be very happy if you comment.

I tried to implement GA (genetic algorithm) in Python