[PYTHON] What is a rational decision that maximizes the chances of encountering an "ideal home"?

■ 1. Introduction

Recently, the influence of Corona has increased the amount of time I spend at home.

Even so, the house is a place where you spend a lot of time in your life, and you want to make it an ideal and cozy place as much as possible.

Are you very satisfied with the house you live in?

According to a survey, when the satisfaction level of one's house was evaluated from 0 to 100 points, only about 9.8% of the respondents answered "90 points or more".

One of the biggest expenses in life is spending on "housing," despite paying tens of thousands, hundreds of thousands (or millions) each month.

(What about readers?)

Why can't I just say "Orenya has 100 points! (At least 90 points)"? ??

It's just an individual hypothesis, but I don't feel that Moyamoya (or made a decision to get rid of that moyamoya) said, "In fact, there might have been a property that was cheaper than this ..." ) Is related.

Therefore, in this article, in order to dispel such anxiety, I will introduce / explain a decision-making method that maximizes the probability of encountering an "ideal house" using mathematical probability theory.

In fact, it is ** possible ** to make decisions that maximize this probability using a mathematical probabilistic approach.

By making such a decision, you will get a ** feeling that you have made a rational decision **, which may increase your satisfaction with your living life.

By the way, as you may have noticed, it is possible to apply conditions similar to "choosing a house" to decision-making (although the constraints are slightly different). For example, choosing an ideal fiancé / boyfriend / girlfriend / choosing a company to work for, etc.

■ 2. Rough flow of this article

1. What kind of decision should I make?
Explanation / Derivation of mathematical model
1. Stochastic simulation with Python
Mathematically tinker with mathematical models

(If you want to know only the conclusion, you can read "1". It may be for a little analyst after 2.)

■ 3. What kind of decision should I make? **

How to make a decision

I will write from the conclusion,

"36.8% of the properties to be confirmed will be unconditionally forgotten, and if you come across a property that is" this is the most ideal thing you've ever had! "

That is the decision-making method that maximizes the probability.

What kind of thing is this? I think it feels like that, so let's look at a concrete example.

Concrete example

For example, you are about to see 50 properties.

In that case, please check the first 18 cases (≈36.8%) and skip them unconditionally.

Then, if the property that seems to be "the best!" Appears among the 19th and subsequent properties (including the 18 properties that we first saw), we will adopt it.

It maximizes the probability that the property will be number one out of a total of 50, rather than choosing it bluntly. あつこ画像.jpg

■ 4. Explanation / Derivation of mathematical model

Why is the probability of choosing the ideal home maximized in this way?

Of course, if you can check all the properties in the world with unlimited time, you don't need such a method.

However, in reality, such a house selection is impossible, and we are required to make the most ideal selection among a certain number of properties within a finite time.

Not only that, but there are also times when you get a contract with someone who just thinks, "Oh my god!"

Let's consider various conditions for choosing a house as "constraints" and consider it as a problem of maximizing the probability of encountering an "ideal house".

Clarification of the problem

Satisfy the following constraints and maximize the probability of choosing the most "ideal home" out of checking up to n properties.

Constraints

Be sure to select one property.
The number of candidate properties n is predetermined.
The number of candidate properties is ranked, and multiple candidate properties will not have the same ranking.
Check one by one in a random order. Which property to check next is always the same probability.
Immediately after confirmation, decide whether to contract (if it is not decided until the end, unconditionally select the last candidate property).
You cannot select a property that has been rejected retroactively.
Do not refuse to select the candidate property itself.

What is the "ideal"?

From the above problem setting / constraint conditions, we will decide the ranking of the "ideal house" by ourselves and define it as the property with the highest ranking.

Problem-solving approach

We will treat it as the best selection problem that maximizes the probability of getting the first place (no information type), not the ranking minimization problem that makes the top candidates come as much as possible (no information type).

In addition, from constraint condition 6 "Cannot select properties that have been rejected retroactively.", The general policy is "Check up to r properties and decide the provisional 1st place among them (r +). 1) We would like to pursue the policy of "finding r that maximizes the probability that a property that exceeds the provisional first place will appear among the subsequent cases" (1 <r ≤ n).

First, check a concrete example, and then derive a generalized stochastic model.

The probability model is differentiated to find the maximum value, and the parameter r at that time is found.

Confirmation of specific example: When n = 10 and r = 5

First, out of a maximum of 10 properties, up to 5 properties are unconditionally rejected, the 6th property is selected, and the probability is that it is the "best" out of 10 properties. Let's ask for.

\underbrace{{9\over 10}\times{8\over 9}\times{7\over 8}\times{6\over 7}\times{5\over 6}}_{reject}\times{1 \over 5}

The formula is as above and the result is

{1 \over 10}

It will be.

This is no different from the probability of choosing Transcendental Texto, and it's not working.

So, let's think about ** "How to make use of the first 5 pieces of information observed" **.

In other words, ** "There are properties (= 1st place in the whole) that include the provisional 1st place in the 5 cases observed first and exceed the provisional 1st place in the 6th and subsequent cases (jth). Let's calculate the probability P "** to do.

That is,

P= \sum_{j=6}^{10} {5 \over j-1}\cdot{1 \over 10}

If you calculate the above, it's OK.

If you actually calculate it, you can see that it is about 37%.

P= \sum_{j=6}^{10} {5 \over j-1}\cdot{1 \over 10}={5 \over 10}\sum_{j=6}^{10} {1 \over j-1}

={1 \over 2}({1 \over 5}+{1 \over 6}+{1 \over 7}+{1 \over 8}+{1 \over 9})\approx0.3728174603

The probability that you can choose the 1st place is much higher than when you chose it as a texto.

How does the probability change depending on the number of send-offs r?

By the way, in the above, we considered the case where half of the total (5 out of 10 cases) was unconditionally forgotten, the provisional 1st place was decided, and the 1st place out of all 10 cases appeared after the 6th case.

As for my next interest, is it more likely that I will unconditionally see off up to the 6th case, or will the probability increase if I see off up to the 7th case, and how many cases is the correct answer?

There is constraint 6 "You can't select a property that was rejected retroactively." So it's not good to see it off too much.

To confirm this concretely, I tried a simple simulation.

Vertical axis (Probability): Probability that the selected property is ranked first
Horizontal axis (% of candidates rejected): Percentage representing "What percentage of the total number of properties should be forgotten?"

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    
    %matplotlib inline
    
    def choose_candidate(n, reject=np.e):
        candidates = np.arange(1, n+1)
        np.random.shuffle(candidates)
        
        if reject == np.e:
            stop = int(round(n/reject))
        else:
            stop = int(round(reject*n/100))
    
        best_from_rejected = np.min(candidates[:stop])
        rest = candidates[stop:]
        
        try:
            return rest[rest < best_from_rejected][0] 
        except IndexError:
            return candidates[-1] 
        
    best_candidate = []
    for r in range(5, 101, 5):
        sim = np.array([choose_candidate(n=100, reject=r) for i in range(100000)])
        best_candidate.append(np.histogram(sim, bins=100)[0][0]/100000)
    
    plt.figure(figsize=(10, 6))
    plt.scatter(range(5, 101, 5), best_candidate)
    plt.xlim(0, 100)
    plt.xticks(np.arange(0, 101, 10))
    plt.ylim(0, 0.4)
    plt.xlabel('% of candidates rejected')
    plt.ylabel('Probability')
    plt.grid(False)
    plt.axvline(100/np.e, ls='--', c='orange')
    plt.show()

Looking at this, you can see that the probability is maximized by seeing off about 37% of the total number of properties.

Why is the probability maximized at about 37%?

By the way, as far as this simulation is concerned, about 37% of the total number of properties is forgotten, and if there is a property that exceeds the provisional first place among the properties after that, if you make a contract with it, the selected property is all properties. It seems that the probability of becoming the first place in the above is maximized.

Why is this happening?

Let's clarify it mathematically.

First of all, as for confirmation, the probability of "when the total number of properties is 10 and up to 5 is forgotten" is expressed by the following formula.

\sum_{j=6}^{10} {5 \over j-1}\cdot{1 \over 10}

The probability of generalizing this to "the total number of properties is n and r is forgotten" is

\sum_{j=r+1}^{n} {r \over j-1}\cdot{1 \over n}

Can be represented by.

For this, we will find the maximum value and the parameters at that time.

\sum_{j=r+1}^{n} {r \over j-1}\cdot{1 \over n}={r \over n}\sum_{j=r+1}^{n} {1 \over j-1}

={r \over n}\sum_{j=r+1}^{n} {n \over j-1}\cdot{1 \over n}={r \over n}\sum_{j=r+1}^{n} ({j-1 \over n})^{-1}\cdot{1 \over n}

Here, set t = j / n. That is, since tn = j

j = r+1 \Leftrightarrow tn = r+1

t = {r+1 \over n}

Is. Also,

j = n\Leftrightarrow tn = n

t=1

Therefore, the original formula is

{r \over n}\sum_{j=r+1}^{n} ({j-1 \over n})^{-1}\cdot{1 \over n}={r \over n}\sum_{t={r+1 \over n}}^{1} ({tn-1 \over n})^{-1}\cdot{1 \over n}={r \over n}\sum_{t={r+1 \over n}}^{1} (t-{1 \over n})^{-1}\cdot{1 \over n}

It can be expressed like this.

Here, when n → ∞, the limit value of (r / n) is set.

\lim_{n→∞}{r \over n}=x

If you put

\lim_{n→∞}{r \over n}\sum_{t={r+1 \over n}}^{1} (t-{1 \over n})^{-1}\cdot{1 \over n}＝\lim_{n→∞}（{r \over n}）\lim_{n→∞}(\sum_{t={r+1 \over n}}^{1} (t-{1 \over n})^{-1}\cdot{1 \over n})

＝x\lim_{n→∞}(\sum_{t={r+1 \over n}}^{1} (t-{1 \over n})^{-1}\cdot{1 \over n})

It is assumed that the ratio of n and r is always constant.

Also, from L'Hôpital's theorem (* 2)

\lim_{n→∞}{r+1 \over n}=x+0=x, \lim_{n→∞}(t-{1 \over n})^{-1}=t^{-1}

is not it.

Also, considering n → ∞ and Δx → 0,

x\lim_{n→∞}(\sum_{t={r+1 \over n}}^{1} (t-{1 \over n})^{-1}\cdot{1 \over n})=x\int_{x}^{1}{t^{-1}} \, dt=x\int_{x}^{1}{1 \over t} \, dt

Can be approximated to.

I'm finally here.

All you have to do is solve this.

x\int_{x}^{1}{1 \over t} \, dt=x[\log{t}]_{x}^{1}=x(\log{1}-\log{x})=-x\log{x}

The formula for the probability of success you want to find

-x\log{x}

And ** could be placed in a very simple form **.

Now, let's find x when maximizing this.

Differentiate the formula of success probability,

{df(x) \over dx}=-1logx-1

When this becomes 0,

-1logx-1=0

-logx=1

x={1 \over e}

Because the natural logarithm e ≈ 2.718281828459

x={1 \over e}\approx0.3678794412

From the above, it was shown that by overlooking about 37%, the probability of finding the first place in the whole is maximized.

■ 5. at the end

Currently, I am an analyst at LIFULL Co., Ltd., so I tried to make Qiita's first article home x mathematics (analysis) material.

It's interesting to use mathematics to model and analyze phenomena in the world, and the facts that deviate from intuition emerge.

As introduced below, Professor Hamada of Tohoku University "The problem, the mathematical model will solve" -redirect? _encoding = UTF8 & btkr = 1) I have referred / referenced / quoted the contents.

Personally, I think that the number of books in the mathematical model area has increased recently, but among them, it is written in an interactive format that anyone can easily understand, and it is a pretty good book for those who want to know about mathematical models. thought.

If you want to know more about the above contents or want to know more mathematical model material, please buy it (link is also described in the reference section).

■ Remaining issues

・ The approach to the truly rational decision-making problem should be brushed up not only from the mathematical probabilistic standpoint as described above, but also from the behavioral science and psychological viewpoints. You will need it. This article doesn't mention that, so if I have time, I'd like to read the papers and books around it.

・ I would like to try what happens if I treat it as a ranking minimization problem with the highest ranking possible. I will write it again when I feel like it.

・ In order to make it mathematically easier to handle, we adopted the constraint that "you cannot select a property that was rejected retroactively." However, the reality is not completely so, so there is room for improvement in this area.

・ In my first Qiita, I wasn't sure how much power I had. Let's make it lighter next time.

■ Reference / Citation

That problem, the mathematical model solves

https://www.amazon.co.jp/dp/B07RGWJQFQ/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1