How to teach good and evil to a computer

As the title suggests, I would like to establish a ** "teach good and evil" ** method for computers. If this is realized, machines will be able to automatically make ethical correct decisions on behalf of non-ethical humans, and will greatly contribute to the improvement of society, such as the reduction of crime and the deterrence of terrorism. Due to my limited knowledge, I am using a library that allows simple naive Bayes classification, but eventually I would like to arrange a framework so that it can be applied to machine learning in general.

What is good and evil and ethics that you want to teach computers in the first place?

First, clarify what kind of ethics you want to teach your computer. It is very difficult for a computer to understand all human actions, philosophical and cultural good and bad, as an encyclopedia (although some people may be serious about it), so it is practical and applied. First of all, I would like to think about "goodness and correctness" that is universally applicable.

Deontological ethics is suitable for machine learning

In conclusion, the best way to train a computer is [German philosopher Immanuel Kant](https://ja.wikipedia.org/wiki/%E3%82%A4%E3%] 83% 9E% E3% 83% 8C% E3% 82% A8% E3% 83% AB% E3% 83% BB% E3% 82% AB% E3% 83% B3% E3% 83% 88) It is ** Deontological Ethics . In deontological ethics, all sentences corresponding to "axiom", which is the standard of good and evil, are ** "should" ** and ** "must be" ** in English ** "I. It is expressed as must ... ”, “ We must not ... ”. Temporarily, the text that mentions such ethical correctness is ** "ethical statement" **, which is clearly distinguished from ** "factual statement" ** such as "actually ... what is it?" Let's say it is done. It is the text that contains this ethical statement that we want the classifier to learn.

Classification of ethical statements in deontological ethics

There are various theories, but ethical statements can be broadly divided into the following two categories. "I can never lie if I'm stabbed with a knife and killed" ** Categorical imperative **, and "If you give me money, I can help you" ** Temporical imperative * *is. The text "Never lie" is, in other words, a statement ** "Unconditionally lie is prohibited" ** (whether it is correct or not) is ethical. It is an ethical statement. On the other hand, the text "If you give me money, I will help people" is a statement that ** "I will help people on the condition that I can earn money" **, which is an ethical statement with a specific premise. I will. There are various theories, but hypothetical ethical statements are recognized in deontological ethics as either unethical or positively non-ethical statements.

In addition to the distinction between categorical and hypothetical, there is a category related to "what categorical ethical statements are universally valid?" It's pretty rough, but the three main points are:

Can make decisions autonomously, not by orders of others ** (can make decisions independently) **

Based on deontological ethics ** (in the form of in-speech acts / imperative mood such as "must", "have to", "should") **
There is universality in the subject / target area of the statement ** (consistent) **

Train the classifier if the sentence contains "must"

To simplify the problem, let's examine whether we can make rudimentary but ethical decisions this time. In the first place, it is necessary to be able to distinguish ** "whether the entered text is a factual statement or an ethical statement" **. So we use a library that can implement a naive Bayes classifier to classify whether the text is a factual statement or an ethical statement. First of all, we will make it possible to judge the difference between the 10-line direct English sentence and the almost same sentence with the mode changed to "must". import nltk nltk.download('punkt') import numpy as np import matplotlib.pyplot as plt from textblob import TextBlob from textblob.classifiers import NaiveBayesClassifier with open("analysed/actual_statement_pos.txt","r", encoding="utf8", errors='ignore')as f: entire_txt = f.read() """stm = entire_txt.split(".")""" stm = entire_txt.splitlines() print("Text lines =", len(stm)) with open("analysed/ethical_statement_pos.txt","r", encoding="utf8", errors='ignore')as f: entire_txt2 = f.read() """stm = entire_txt.split(".")""" stm2 = entire_txt2.splitlines() print("Text lines =", len(stm2))

Text lines = 10 Text lines = 10

for i,item in enumerate(stm): train = [ (stm[i], "act"), (stm2[i], "etc"), ] cl = NaiveBayesClassifier(train) state1 = "We are trying to be ethical" state2 = "We must be trying to be ethical" state1_class = cl.classify(state1) state2_class = cl.classify(state2) print(state1,"<<- Text | Classified ->> ", state1_class) print(state2,"<<- Text | Classified ->> ", state2_class)

We are trying to be ethical <<- Text | Classified ->> act We must be trying to be ethical <<- Text | Classified ->> etc

The sentence "We are ..." was judged as a statement of fact, while the sentence "We must be ..." was judged as a statement of ethics. Now you can at least ** distinguish between rudimentary factual and ethical statements **.

Judge whether the ethical statement to be judged matches or resembles the learned ethical statement

Next, we will verify whether the ethical statement we want to judge is "correct" by the classifier. I have a theory, but in order to understand ethical judgment correctly, it is necessary to understand ** the polarity of the sentence (positive or negative) ** in addition to the content of the sentence. For example, even if you teach the classifier a statement that prohibits suicide, such as * "We must not kill ourselves" *, the match rate of the word with the text * "We must kill ourselves" * is 80%, so the classifier does not. ** There is a risk of misunderstanding this suicide-encouraging sentence as "80% ethically correct" **. To prevent this, add the polarity of the sentence mentioned above to the classification label in addition to the statement itself. In other words, ** if there is a high degree of agreement with the correct ethical statement but there is a polarity discrepancy, it should be regarded as a non-ethical statement **. In addition to polarity, it is important to be able to make independent decisions, but at this stage it is only for reference.

At this stage, we planned to use TextBlob's sentiment analysis, but as a result of verification, we decided that it could not be used, so the labels indicating polarity are set as etc_pos, etc_neg as nominal features. For information on sentiment analysis using TextBlob, see the article here.

with open("analysed/ethical_statement_pos.txt","r", encoding="utf8", errors='ignore')as f: entire_txt = f.read() stm_pos = entire_txt.splitlines() print("Text lines =", len(stm_pos)) with open("analysed/ethical_statement_neg.txt","r", encoding="utf8", errors='ignore')as f: entire_txt2 = f.read() stm_neg = entire_txt2.splitlines() print("Text lines =", len(stm_neg)) for i,item in enumerate(stm_pos): train2 = [ (stm_pos[i], "etc_pos"), (stm_neg[i], "etc_neg"), ] cl2 = NaiveBayesClassifier(train2) state3 = "We must be trying to be ethical" state4 = "We must not be trying to be ethical" state3_class = cl2.classify(state3) state4_class = cl2.classify(state4) print(state3,"<<- Text | Classified ->> ", state3_class) print(state4,"<<- Text | Classified ->> ", state4_class)

We must be trying to be ethical <<- Text | Classified ->> etc_pos We must not be trying to be ethical <<- Text | Classified ->> etc_neg

"We should try to be ethical" is a "correct" statement, "We don't have to try to be ethical" was determined to be an "incorrect" statement. It is no exaggeration to say that this classifier has learned ** "what is right and what is not" from at least 10 lines of sentences **.

(GitHub for this article is here)

Future outlook

-Since the TextBlob classifier used for judgment is a black box, I would like to create my own naive Bayes classifier so that the degree of matching and features can be visualized. ・ As I wrote above, I would like to improve the accuracy by increasing the category data that can be classified. At least I would like to learn polarity as an ordinal feature. ・ Of course, I don't really want to judge right from wrong from only 10 lines of sentences, so I would like to increase the factual and ethical statements that the classifier learns.

[PYTHON] Teaching machines good and evil with a naive Bayes classifier (draft stage)