・ This is my first post, so it may be difficult to see. please forgive me.
・ There may be some mistakes compared to the strict genetic algorithm. (Well, learning something? Forgive me because it seems like something is going well ...)
-I'm sorry if the source code is difficult to read or redundant because I haven't shown it to others. (I would be grateful if you could tell me the parts that were difficult to read and the parts that were useless.)
There are many kinds of wisdom of excellent Google teachers. https://www.sist.ac.jp/~suganuma/kougi/other_lecture/SE/opt/GA/GA.htm There are also definitions of words and this area, so please refer to them.
https://www.youtube.com/watch?v=w1MF0Iz0p40 Youtube is still amazing. You can easily see such an interesting and easy-to-understand thing.
For the time being, as a sample problem, let's find the coordinates where the distance from the origin is 0 in the three-dimensional space (x, y, z). Obviously, (0,0,0) should be the answer ... After that, try changing the distance from the origin or making it about 20 dimensions.
Since I recently studied the class, I tried to implement the general flow of GA in a class. To be honest, the advantages and disadvantages of putting them together in a class didn't come to my mind ...
For the time being, I will explain the general role of the functions in the program first.
Function type
class GA:
#Get the parameters on GA from the Setting received as an argument (number of generations, etc.)
def __init__(self, Setting):
#Display parameters on GA held in class
def get_parameter(self, flag=0, out_path="./"):
#Most of the GA processing is written in this(Something like the main function)
def Start_GA(self):
#Randomly create genes for each individual as an initial population
def make_genes(self):
#A function that evaluates genes (calculates fitness), and at the same time, sorts in descending order of evaluation and retains data
def evaluate(self, genes, goal):
#Output genes to an external file along with fitness
def Out_data(self, genes, evaluated_data, generation):
#Save high-ranking individuals with high fitness
def Save_elite(self, genes, evaluated_data):
#A function that selects parents to have children of the next generation.
def select(self, method, genes, evaluated_data):
#A function that multiplies (crosses) selected parents to create a new individual
def crossover(self, method, genes, Select_id):
#The greater the stagnation, the more randomness is added to expand the solution search area (mutation).
def mutation(self, genes, Stagnation):
In this, other functions are incorporated in the StartGA function, so if you execute this, the program will run.
GA.py
import numpy as np
import random
import os
import copy
class GA:
# Setting=[Name, MaxGeneration, Population, EliteCount, MaxStagnationGeneration, GenesDimension ...]
def __init__(self, Setting):
self.Name = Setting[0]
self.MaxGeneration = Setting[1]
self.Population = Setting[2]
self.EliteCount = Setting[3]
self.MaxStagnationGeneration = Setting[4]
self.GenesDimension = Setting[5]
os.makedirs("./Out/", exist_ok=True)
print("\n GA start!! \n")
# flag==0, print output flag==1, file description
def get_parameter(self, flag=0, out_path="./"):
#Path for file output
out_path = out_path + "GA_parameter.txt"
#Output contents
contents = [
f'GA_parameter!!\n',
f'Name: {self.Name}',
f'MaxGeneration: {self.MaxGeneration}',
f'Population: {self.Population}',
f'EliteCount: {self.EliteCount}',
f'GenesDimension: {self.GenesDimension}',
]
#Change the output destination with flag
if flag==0:
for sentense in contents:
print(sentense)
elif flag==1:
with open(out_path, mode="w") as file:
for sentense in contents:
print(sentense, file=file)
#Function to start GA (or rather, it's like the main function because most of the processing is written in this?)
def Start_GA(self):
#Current generation
generation = 0
#Generation stagnation value
Stagnation = 0
#Target value
goal = 0
#Gene trajectory with the best fitness
top_genes = list()
#Genes are randomly created as an initial population.
genes = self.make_genes()
while(True):
#Ends when you reach the maximum generation
if generation >= self.MaxGeneration:
break
else:
#Fitness assessment
evaluated_data = self.evaluate(genes, goal)
#Output gene file and fitness to the outside
self.Out_data(genes, evaluated_data, generation)
#Save high-ranking individuals with high fitness
elite_genes = copy.deepcopy(self.Save_elite(genes, evaluated_data))
#Select people with high fitness
Select_id = self.select(0, genes, evaluated_data)
#Crossing to create new genes
genes = self.crossover(0, genes, Select_id)
#Give a mutation to expand the solution search area
genes = self.mutation(genes, Stagnation)
#Add elite gene
genes[len(genes):len(genes)] = elite_genes
#Save the most fit individual
top_genes.append(elite_genes[0])
#After the second generation, if the fitness stagnation (the best value of fitness is not updated) exceeds the maximum number of stagnation, the program ends.
if len(top_genes)>=2:
if top_genes[-1]==top_genes[-2]:
#The algorithm ends when the maximum number of stagnation is exceeded
if Stagnation >= self.MaxStagnationGeneration:
exit()
else:
Stagnation += 1
else:
Stagnation = 0
#Advance generations
generation +=1
#Create genes randomly. Self for one individual.Expand the dimension by the number of Genes Dimension
def make_genes(self):
genes = list()
tmp = list()
for child in range(self.Population):
for _ in range(self.GenesDimension):
tmp.append(random.randint(-30,30))
genes.append(tmp)
tmp = list()
# genes.shape=(self.Population, self.GenesDimension)
return genes
#Gene evaluation
def evaluate(self, genes, goal):
#The evaluated data is stored in a dictionary type.(key, value)=(child_id, fitness)
evaluated_data = dict()
for child in range(self.Population):
#The higher the fitness, the better (adjusted to a maximum of 1).)
fitness = 1/(self.eval_func(genes[child], goal) + 1)
evaluated_data[child] = fitness
#Descending sort according to evaluation value
evaluated_data = dict(sorted(evaluated_data.items(),reverse=True, key=lambda x:x[1]))
return evaluated_data
#Auxiliary function of evaluate
#Calculate the difference from the target value
def eval_func(self, gene, goal):
sum_value = 0
# self.Calculate the Euclidean distance from the origin in GenesDimension space.
for id in range(len(gene)):
sum_value += gene[id]**2
return abs(np.sqrt(sum_value) - goal) # goal(Target value)Calculate the deviation from.
#Output genes data and fitness to an external file
def Out_data(self, genes, evaluated_data, generation):
#Gene output(Before sorting),Same comma delimited as csv
gene_path = "./Out/" + str(generation) + ".gene"
with open(gene_path, "w") as file:
for child in range(len(genes)):
for id in range(len(genes[child])):
if id==len(genes[child])-1:
print(str(genes[child][id]), file=file)
else:
print(str(genes[child][id]), end=",", file=file)
#Fitness output (after sorting),Same comma delimited as csv
fitness_path = "./Out/" + str(generation) + ".fit"
with open(fitness_path, "w") as file:
for id, fitness in evaluated_data.items():
print(str(id) +","+ str(fitness), file=file)
#Save high-ranking individuals with high fitness
def Save_elite(self, genes, evaluated_data):
elite_id = list()
elite_genes = list()
# elite_Get id from dictionary
for key in evaluated_data.keys():
elite_id.append(key)
#Extract from the top as many as the number of elite
for elite in range(self.EliteCount):
elite_genes.append(genes[elite_id[elite]])
return elite_genes
# method==0: Expected value selection
# method==1: Ranking selection
# method==2: Tournament selection
def select(self, method, genes, evaluated_data):
#Store id and fitness separately in a list
id_list = list()
fitness_list = list()
Select_id = list()
for id, fitness in evaluated_data.items():
id_list.append(id)
fitness_list.append(fitness)
#Expected value selection
if method==0:
roulette = 360
tmp_list = list()
sum_fitness = sum(fitness_list) #Get the total value of fitness
cumulative_fitness = np.cumsum(roulette*np.array(fitness_list)/sum_fitness) #Create a wide 360-degree table with one rotation according to the high fitness
for child in range(self.Population - self.EliteCount): #Population-Elite Population to create non-elite individuals
for _ in range(2): #Repeat to combine the two genes.
Fate_dice = random.uniform(0,360) # 0~Roll the dice of fate between 360
Hit_id = 0
while True:
if cumulative_fitness[Hit_id] >= Fate_dice:
break
else:
Hit_id += 1
tmp_list.append(id_list[Hit_id])
Select_id.append(tmp_list)
tmp_list = list()
return Select_id
# method==0: Uniform crossover
# method==1: Multipoint crossover
# method==2: Partial coincidence crossover
def crossover(self, method, genes, Select_id):
new_genes = list()
#Select the next generation genes_id[child][0]Created using the id number of
for child in range(self.Population - self.EliteCount):
new_genes.append(genes[Select_id[child][0]])
if method==0:
probability = 0.4
for child in range(self.Population - self.EliteCount):
for dim in range(len(genes[0])):
Fate_random = random.uniform(0,1)
if Fate_random <= probability:
new_genes[child][dim] = genes[Select_id[child][1]][dim]
return new_genes
def mutation(self, genes, Stagnation):
probability = 0.4 #Probability of mutation
serious_rate = Stagnation/(self.MaxGeneration - self.EliteCount) #The greater the stagnation, the more likely it is to mutate.
serious_count = int(len(genes)*serious_rate) #By multiplying the size of the genes sequence, it is possible to adjust how much of the sequence is mutated.
for child in range(serious_count):
for dim in range(len(genes[0])):
Fate_random = random.uniform(0,1) # 0~Get a random number of 1
if Fate_random <= probability:
genes[child][dim] += random.randint(-10,10) # -10~Give 10 random numbers (fixed)
return genes
# Setting=[Name, MaxGeneration, Population, EliteCount, MaxStagnationGeneration, GenesDimension ...]
Setting = ["Sample", 150, 30, 2, 100, 3]
GA = GA(Setting)
GA.Start_GA()
In make_genes, genes are created in the range of -30 to 30 for one axis (x axis, etc.). Also, for mutations, values are added in the range of -10 to 10.
Under the above conditions, I calculated 30 individuals per generation for about 150 generations. The program to see the result is as follows.
analysis_log.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Calculate distance from data frame
def calc_distance(dataframe):
distance = 0
for col in range(len(dataframe.columns)): #Repeat as many times as there are columns
distance += dataframe[col]**2
distance = np.sqrt(distance) #Calculate Euclidean distance
return distance
#Combine data frames
def merge_dataframe(origin, add):
origin = pd.concat([origin, add], axis=1)
return origin
#Create a graph as time series data
def plot_log(distance, MaxGeneration):
y_mean = np.array(distance.mean()) #Get the average value of the data
y_median = np.array(distance.median()) #Get median
y_min = np.array(distance.min()) #Get the minimum value
y_max = np.array(distance.max()) #Get maximum
x = np.arange(len(y_mean))
xicks = np.arange(0, MaxGeneration+1, MaxGeneration/10) #To display Max Generation in memory as well+1
plt.plot(x, y_mean, label="mean", color="green")
plt.plot(x, y_median,label="median", color="blue")
plt.plot(x, y_min, label="min", color="red")
plt.plot(x, y_max, label="max", color="cyan")
plt.xlabel("$Generation$", fontsize=15)
plt.ylabel("$distance$", fontsize=15)
plt.xlim(xmin=0)
plt.ylim(ymin=0)
plt.grid(True)
plt.xticks(xicks)
plt.legend()
plt.savefig("analysis_gene.jpg ")
MaxGeneration = 150
for G in range(MaxGeneration):
if G==0:
gene_path = "./Out/" + str(G) + ".gene"
dataframe_0 = pd.read_csv(gene_path, header=None) #Reading data
distance_0 = calc_distance(dataframe_0) #Calculate distance from data frame
else:
gene_path = "./Out/" + str(G) + ".gene"
dataframe = pd.read_csv(gene_path, header=None)
distance = calc_distance(dataframe)
distance_0 = merge_dataframe(distance_0, distance)
plot_log(distance_0, MaxGeneration)
The horizontal axis is the number of generations, and the vertical axis is the distance from the origin (0 is the target value this time). In addition, mean, median, min, and max are the average value, median value, minimum value, and maximum value in each generation, respectively. The search seems to be proceeding smoothly. I'm happy.
Finally, with the target value set to 100, 100 individuals per generation are calculated for about 500 generations. Then, the target value was returned to 0, the number of gene axes (number of dimensions) was set to 20, and the calculation was performed for 150 individuals and 2000 generations.
[Impression] Somehow, I feel that it would be nice if the values were updated ... I wrote a lot about it, but if there is a fierce man who has read it to the end, I would be very happy if you comment.
Recommended Posts