Introduction

Immediately, are you all investing? If so, what kind of investment are you making?

There are various investment targets in the world, and there are various investment methods. There is a lot of information about investment in the world. It may be better to say that it is flooding. Taking books as an example, there is an array of books, from light introductory books that encourage investment with sweet words to specialized financial books. Blogs and SNS are also important sources of information, and YouTube for investment seems to be gaining popularity these days.

But despite the variety of sources of information, only a handful of successful investments can be made. It's a bit old research, but according to Nomura Securities' individual investor research in 2015, the percentage of individual investors who are making a total profit is 9.3%. Why do you fall into such a situation? Let me give you some examples of the approach that beginners in investment take.

――Jump to the soaring price of trendy theme stocks and grab the high price --Imitate Buffett and try to select cheap growth stocks from the four seasons report --Try screening stocks by PER or ROE based on the knowledge you have heard ――Try to hold a stock for dividends --Try to buy the brand recommended by the magazine analyst ――Try trading according to the announcement of corporate financial results and economic indicators --Buy a trendy ESG company --Use technical to predict whether the chart will rise or fall

Some people may be disappointed to see the unrealized loss every time the lunch break of working hours comes because they trade such a method swallowingly. Again, why do we end up in this situation? That's because ** beginners don't know what trading styles are inherently superior ** in the first place.

Purpose of this article

In this article, I have kept in mind that it will give a completely new awareness to the investment outlook that beginners have, both in the rabbit and in the corner. Trading is science. Scientific trading is, in other words, quantitative and empirical trading.

In this article, we will show you what trading styles dominate, based on statistical thinking. In the article, I chose terms that are as easy to understand as possible for beginners to read easily, and tried to omit difficult mathematical formulas and technical considerations as much as possible.

It also teaches you specific techniques for putting scientific trading into practice. Programming is essential for quantitative and empirical trading. In this article, I wrote a program to perform the minimum required verification using the language Python. Many people will think of it as programming, but it's a good idea to take this opportunity to start studying. Working with a sense of purpose is the fastest way to improve.

Writer's investment performance

At the beginning of the article, I would like to introduce the performance of my investment so far. Suddenly you might think it's ugly to talk about the money you've made, but it's unconvincing to say anything to someone who hasn't produced results. One thing to add is that the most important thing in investing is process management, not results. This is because, as we will see later, investment performance is highly dependent on luck, and making decisions based solely on results often leads to false conclusions.

Originally, I was investing alongside a professional engineer, but in 2014 I started full-scale asset management. The startup fund at that time was 50 million yen. In 2016, we developed the operation system that is currently the mainstay. This operation system is our flagship system that has earned about 140 million yen from the start of operation to the present. The investment is targeted at large-scale Japanese stocks called TOPIX500, and the average yield for the last four years has been around 40%.

Problem presentation

Now, I will explain why the beginner's approach mentioned at the beginning cannot succeed and what is the problem. Success here means to obtain the desired yield continuously and stably without any setbacks during the investment period of at least 5 years.

First, let's consider whether the method at the beginning will make a profit or not. As an example, consider investing in a stock that is considered cheap and of high quality. When selecting individual stocks, you will most likely screen them using brokerage tools. Let's take a look at the performance when investing in a stock with a PER of 10 times or less and an ROE of 10% or more for one month.

The aggregation period is from April 2017 to October 2020, and the profit earned when the screened stock is purchased at the beginning of the month (to be exact, purchased at the closing price of the previous month) and settled at the closing price of the end of the month. It is a distribution map (histogram). At this time, the market return is subtracted from the return of individual stocks. If you do not do this, you will be affected by the sharp rise in the Nikkei average, and you will not be able to distinguish whether the profits you have earned depend on your stock selection skills or if you are lucky enough to benefit from the rise in the market.

Well, is this investment method profitable?

In conclusion, it may or may not be profitable. The stock you have invested in may be the far right (that is, the stock that made a profit) in the distribution shown above. It may also be the leftmost part of the distribution (that is, the stock that lost money). All of the stocks included in the distribution shown above are "cheap and high quality" stocks by your standards. It is not the fact that "cheap and high quality" determines the final profit and loss from this, but it can be said that it is your luck to pick up the stock from the many target stocks at that time. I will. For example, I happened to like the name of the company, it happened to appear at the top of the screening, I read a magazine and suddenly thought of investing, or something like that. Let's do it.

Investment is like a gacha

The outcome of your investment depends on your luck. Rather, it's mostly determined by luck. Luck here means the probability distribution as shown in the above figure. As you can see from the above example, if you focus only on the result of the investment, you can never judge whether the investment itself was successful. It makes no sense whether the outcome of your trade is a product of your own skills or a coincidence.

The inability to judge the results correctly in this way means that we do not know whether we can rely on the method. In other words, you cannot go through the process of improvement in business. An approach like the one at the beginning lacks the idea of turning the improvement process around. In order to run the improvement process, we need a mechanism to feed back quantitative results in a short period of time.

If you can't go through the improvement process, your investment skills will never improve in the future. This is the same as spinning a gacha whose contents are unknown. You're just repeating the barren act of pulling a few gachas out of the above emission distribution several times a year, and if the results are good, you're happy and if you're bad, you're discouraged.

Solution

So is there a way to capture this gacha? Naturally, the answer is "YES".

Features of Gacha stand

There are good and bad things in the gacha stand. A table with a lot of hits inside is a good table, and a table with only scraps is a bad table. What does it look like when you look at these platforms in terms of emission distribution? Here are three examples:

First of all, how about the leftmost stand? The center of the emission distribution of this platform is almost 0. That is, the expected value of this stand is 0. No matter how much you draw gacha on this platform, your assets will not grow. Even if you make a temporary profit, it just happens that your performance is outperforming. In the long run, the average profit and loss earned will always approach zero.

Then how about the middle platform? If you look closely at the emission distribution of this platform, you can see that it has shifted slightly to the right (plus side). This is the state where the "expected value is positive". In trading, we use expressions such as "has an edge" and "has an alpha (excess profit)". By continuing to pull the gacha, you can grow your assets on this platform. However, the ups and downs of assets along the way are volatile, and you may stop halfway through.

Finally, it is the rightmost stand. The emission distribution of this platform is slightly shifted to the right, similar to the platform in the middle. And if you take a closer look, you can see that the variation (spreading from the center) is within a small range compared to the platform in the middle. It is a platform with a high expected value and a small variation, that is, a high "Sharpe ratio". You can find such a gacha stand, and by continuing to pull it, your success will be solid.

Anyway, keep spinning the gacha many times

It is not possible to grasp the tendency of what the contents of the gacha stand look like by pulling the gacha once or twice. Therefore, if you really want to grasp the tendency of the contents, you have to keep turning it over and over again. The important thing here is to keep turning only one gacha without turning aside. If you pull it halfway and try to replace it with this gacha because it seems that this gacha won't hit you very much, you can't grasp the tendency even if it takes a lifetime.

In other words, in trading, you have to continue trading consistently based on only one method. That is not enough 10 times or 100 times. It has to be enforced many times, exceeding 1000 times. Don't shift from one investment method to another, such as in magazines. The only way to combat the uncertainty of investment is to earn trials. The only weapon we have is the law of large numbers.

Therefore, ** a dominant trading strategy is a “manageable” strategy **. Trades made by individual investors that span more than a month are not suitable. At least trading should be done daily. The best of these is scalping. Also, the shorter the trading period, the less uncertainty about price movements and the easier it is to grasp the trend (specifically, investment uncertainty is proportional to the square root of the investment period).

If you keep pulling the gacha, you will run out of money

However, as a matter of course, you need money to draw a gacha. The more you pull, the more money will disappear from your wallet. This is the same for trading. In trading, one transaction always costs money. If it is a stock, you will be paying the commission and credit interest rate when purchasing the stock, and if it is FX, you will be paying the spread to the trader in one trade. If you trade many times, your funds will run out in no time.

To be honest, estimating the tendency of the contents of the gacha while throwing live ammunition is an act that is completely unprofitable. Rather than doing such a thing, there is a way to hit a good gacha stand in advance.

Estimate the contents of the gacha stand in advance from the data

In the investment world, we can collect some of the data previously used on the Gacha stand. And you can use this data to verify what kind of gacha stand is good, that is, what kind of trading strategy you should bet on before drawing gacha. This is a trading method called quantitative and empirical. Data analysis is used to find the expected value that can overcome the transaction costs.

Research on quantitative and empirical trading has been conducted for many years. Especially recently, there are many cases where machine learning is used for this. Machine learning is very useful not only for confirming the statistical significance of data, but also for extracting hidden market characteristics that no one knows. The programming library is extensive, and APIs for downloading data are being developed. Even individual investors can fully analyze the data.

And to operation

In actual operation, it is necessary to fix the conditions as much as possible based on the verification result and trade mechanically. Don't put human emotions in it. And if you mechanically repeat a lot of trades, there is no reason not to automate this. Such a trading style is also collectively referred to as "system trading".

In practice, a successful investment approach is to go through the process of improvement while comparing pre-verification (ie backtesting) with actual operational performance. For those who have never analyzed data, the following chapters will show you how to analyze data using Python.

Preparation for quantitative and empirical trading

Now, let's collect the data first. Data is collected using the Yahoo Finance API. At this time, a library called yfinance is used, so install it first. For more information on yfinance, please refer to here.

pip install yfinance

(Addition) About the bug of yfinance There is a bug in yfinance and you cannot get financial statement data by default. You can get it by modifying the base.py file of yfinance as follows.

# base.py Near line 353
# get fundamentals
# data = utils.get_json(url+'/financials', proxy)← Default program. Mask
url = "{}/{}/financials".format(self._scrape_url, self.ticker)   #add to
data = utils.get_json(url, proxy)                                #add to

Price historical data

First, let's download price data for stocks on the Japanese stock market.

By executing the following code, you can get the historical data of 7203 Toyota Motor in an instant. Since the ticker symbol of Japanese market stocks is represented by the securities code + ".T", you can easily obtain the data of almost all stocks without looking up the Yahoo Finance symbol.

import yfinance as yf

ticker = yf.Ticker("7203.T")
hist = ticker.history(period="max")
print(hist)

Execution screen

               Open     High      Low    Close    Volume  Dividends  Stock Splits
Date
1999-05-06  2259.74  2337.44  2233.84  2337.44   3115000        0.0             0
1999-05-07  2324.49  2330.96  2233.84  2253.27   3033000        0.0             0
1999-05-10  2253.27  2279.16  2233.84  2246.79   1261000        0.0             0
1999-05-11  2266.22  2279.17  2227.37  2227.37   1686000        0.0             0
1999-05-12  2227.37  2266.21  2227.37  2266.21   2596000        0.0             0
...             ...      ...      ...      ...       ...        ...           ...
2020-11-02  6866.00  7016.00  6850.00  6949.00   5721200        0.0             0
2020-11-04  7024.00  7054.00  6976.00  6976.00   6278100        0.0             0
2020-11-05  6955.00  7032.00  6923.00  6984.00   5643400        0.0             0
2020-11-06  7070.00  7152.00  7015.00  7019.00  11092900        0.0             0
2020-11-09  7159.00  7242.00  7119.00  7173.00   7838600        0.0             0

[5324 rows x 7 columns]

Profit and loss statement

Next, let's look at the data in the financial statements. First of all, from the income statement.

You can get the income statement for the last 3 years from the code below. Of these, the most important are Total Revenue (sales), Operating Income (operating income), and Net Income (net income).

financials = ticker.financials
print(financials)

Execution screen

                                         2020-03-31   2019-03-31   2018-03-31   2017-03-31
Research Development                           None         None         None         None
Effect Of Accounting Charges                   None         None         None         None
Income Before Tax                       2.82576e+12  2.64553e+12  3.09051e+12  2.55588e+12
Minority Interest                       6.77064e+11  7.18985e+11   6.9412e+11  6.68264e+11
Net Income                              2.07618e+12  1.88287e+12  2.49398e+12  1.83111e+12
Selling General Administrative          2.97317e+12   2.9867e+12   3.0905e+12  2.86848e+12
Gross Profit                            5.40763e+12   5.4439e+12  5.49036e+12  4.86286e+12
Ebit                                    2.43446e+12   2.4572e+12  2.39986e+12  1.99437e+12
Operating Income                        2.43446e+12   2.4572e+12  2.39986e+12  1.99437e+12
Other Operating Expenses                       None         None         None         None
Interest Expense                        -3.2217e+10  -2.8078e+10  -2.7586e+10  -2.9353e+10
Extraordinary Items                            None         None         None         None
Non Recurring                                  None         None         None         None
Other Items                                    None         None         None         None
Income Tax Expense                       6.8343e+11  6.59944e+11  5.04406e+11    6.289e+11
Total Revenue                             2.993e+13  3.02257e+13  2.93795e+13  2.75972e+13
Total Operating Expenses                2.74955e+13  2.77685e+13  2.69796e+13  2.56028e+13
Cost Of Revenue                         2.45224e+13  2.47818e+13  2.38892e+13  2.27343e+13
Total Other Income Expense Net          3.91297e+11   1.8833e+11   6.9065e+11  5.61513e+11
Discontinued Operations                        None         None         None         None
Net Income From Continuing Ops          2.14233e+12  1.98559e+12  2.58611e+12  1.92698e+12
Net Income Applicable To Common Shares   2.0589e+12  1.86808e+12  2.48169e+12  1.82131e+12

Balance sheet (balance sheet)

Next is the balance sheet.

You can get the balance sheet for the last 3 years from the code below. Of these, the most important are Total Assets, Total Liab, and Total Stockholder Equity.

balance_sheet = ticker.balance_sheet
print(balance_sheet)

Execution screen

                                    2020-03-31    2019-03-31    2018-03-31    2017-03-31
Capital Surplus                   4.893340e+11  4.871620e+11  4.875020e+11  4.840130e+11
Total Liab                        3.194275e+13  3.186981e+13  3.087815e+13  3.056711e+13
Total Stockholder Equity          2.006062e+13  1.934815e+13  1.873598e+13  1.751481e+13
Minority Interest                 6.770640e+11  7.189850e+11  6.941200e+11  6.682640e+11
Other Current Liab                4.102642e+12  4.479344e+12  4.399669e+12  3.979935e+12
Total Assets                      5.268044e+13  5.193695e+13  5.030825e+13  4.875019e+13
Common Stock                      3.970500e+11  3.970500e+11  3.970500e+11  3.970500e+11
Other Current Assets              2.469880e+11  1.425310e+11  2.022920e+11  1.235700e+10
Retained Earnings                 2.342761e+13  2.198752e+13  1.947346e+13  1.760107e+13
Other Liab                        2.746823e+12  2.887743e+12  2.902003e+12  3.163780e+12
Treasury Stock                   -4.253379e+12 -3.523575e+12 -1.622034e+12 -9.673210e+11
Other Assets                      8.905140e+11  1.182809e+12  1.067759e+12  1.012639e+12
Cash                              2.774498e+12  2.790212e+12  2.390524e+12  2.257064e+12
Total Current Liabilities         1.790238e+13  1.822694e+13  1.779689e+13  1.731896e+13
Deferred Long Term Asset Charges  3.547850e+11  5.018720e+11  4.941200e+11  5.039850e+11
Short Long Term Debt              1.418710e+11  1.560380e+11  1.674550e+11  2.285990e+11
Other Stockholder Equity         -1.166273e+12 -9.166500e+11  4.356990e+11  6.409220e+11
Property Plant Equipment          1.087864e+13  1.068549e+13  1.026767e+13  1.019711e+13
Total Current Assets              1.864253e+13  1.887924e+13  1.815266e+13  1.783370e+13
Long Term Investments             1.184489e+13  1.090829e+13  1.133854e+13  1.069452e+13
Net Tangible Assets               2.006062e+13  1.934815e+13  1.873598e+13  1.751481e+13
Short Term Investments            1.477202e+12  2.234892e+12  2.447703e+12  2.522598e+12
Net Receivables                   2.659748e+12  2.940890e+12  2.708900e+12  2.552805e+12
Long Term Debt                    1.029678e+12  7.655860e+11  5.910860e+11  5.784750e+11
Inventory                         2.434918e+12  2.656396e+12  2.539789e+12  2.388617e+12
Accounts Payable                  2.434180e+12  2.645984e+12  2.586657e+12  2.566382e+12

Cash flow statement

At the end of the financial statements is the cash flow statement.

You can get the cash flow statement for the last 3 years from the code below. Of these, the most important are Total Cashflows From Operating Activities, Total Cashflows From Financing Activities, and Total Cashflows From Investing Activities.

cashflow = ticker.cashflow
print(cashflow)

Execution screen

                                             2020-03-31    2019-03-31    2018-03-31    2017-03-31
Investments                                2.334300e+11  6.166420e+11 -3.322730e+11  6.950000e+08
Change To Liabilities                     -7.641000e+10  9.488700e+10  4.664800e+10  1.459570e+11
Total Cashflows From Investing Activities -3.150861e+12 -2.697241e+12 -3.660092e+12 -2.969939e+12
Net Borrowings                             1.558199e+12  7.229710e+11  6.893390e+11  1.030929e+12
Total Cash From Financing Activities       3.971380e+11 -5.408390e+11 -4.491350e+11 -3.751650e+11
Change To Operating Activities            -2.703900e+11  4.084000e+11  4.857250e+11  7.724320e+11
Net Income                                 2.076183e+12  1.882873e+12  2.493983e+12  1.831109e+12
Change In Cash                             7.056750e+11  4.868760e+11  7.031300e+10  2.098980e+11
Repurchase Of Stock                       -4.761290e+11 -5.496370e+11 -4.478180e+11 -7.039860e+11
Effect Of Exchange Rate                   -1.312450e+11 -4.164100e+10 -4.358800e+10 -1.348600e+10
Total Cash From Operating Activities       3.590643e+12  3.766597e+12  4.223128e+12  3.568488e+12
Depreciation                               1.605383e+12  1.792375e+12  1.734033e+12  1.610950e+12
Dividends Paid                            -6.299870e+11 -6.448060e+11 -6.268920e+11 -6.381720e+11
Change To Inventory                       -1.140960e+11 -1.669020e+11 -1.711480e+11 -2.463260e+11
Change To Account Receivables              2.488950e+11 -2.468450e+11 -1.054350e+11 -2.647840e+11
Other Cashflows From Financing Activities -5.494500e+10 -6.936700e+10 -6.376400e+10 -6.393600e+10
Change To Netincome                        2.228170e+11  1.431380e+11 -4.994310e+11 -1.598180e+11
Capital Expenditures                      -3.595131e+12 -3.738887e+12 -3.598707e+12 -3.541437e+12

Stock summary

Finally, how to get a summary of stocks.

You can get the basic information of the stock from the code below. Of these, the most important are marketcap, sharesOutstanding, forwardPE (forecast PER), dividendYield (dividend yield), profitMargins (net profit ratio) and much more.

info = ticker.info
print(info)

Execution screen
Omitted because it is a dictionary type

Acquisition of multiple stocks

If you want to get multiple stocks at the same time, use the Tickers class and separate the arguments with a space.

tickers = yf.Tickers("7203.T 9984.T 6861.T")
hists = []

for i in range(len(tickers.tickers)):
    hists.append(tickers.tickers[i].history())

print(hists[0])

Execution screen

              Open    High     Low   Close    Volume  Dividends  Stock Splits
Date
2020-10-09  7026.0  7029.0  6947.0  6967.0   3395900          0             0
2020-10-12  6932.0  6945.0  6900.0  6911.0   2638200          0             0
2020-10-13  6977.0  7030.0  6946.0  7030.0   3667700          0             0
2020-10-14  6962.0  6970.0  6919.0  6935.0   3065400          0             0
2020-10-15  6898.0  6933.0  6895.0  6915.0   2844800          0             0
2020-10-16  6940.0  6944.0  6825.0  6829.0   3770200          0             0
2020-10-19  6874.0  6948.0  6870.0  6945.0   3047000          0             0
2020-10-20  6926.0  6945.0  6889.0  6897.0   2342400          0             0
2020-10-21  6962.0  7052.0  6956.0  7009.0   4795000          0             0
2020-10-22  6967.0  6984.0  6941.0  6966.0   3207500          0             0
2020-10-23  7009.0  7010.0  6944.0  6973.0   3963300          0             0
2020-10-26  6970.0  7003.0  6955.0  6990.0   2675000          0             0
2020-10-27  6970.0  6993.0  6924.0  6961.0   3234300          0             0
2020-10-28  6888.0  6927.0  6845.0  6895.0   3760200          0             0
2020-10-29  6795.0  6924.0  6780.0  6893.0   4099900          0             0
2020-10-30  6848.0  6878.0  6803.0  6803.0   5207800          0             0
2020-11-02  6866.0  7016.0  6850.0  6949.0   5721200          0             0
2020-11-04  7024.0  7054.0  6976.0  6976.0   6278100          0             0
2020-11-05  6955.0  7032.0  6923.0  6984.0   5643400          0             0
2020-11-06  7070.0  7152.0  7015.0  7019.0  11092900          0             0
2020-11-09  7159.0  7242.0  7119.0  7173.0   7838600          0             0

Acquisition of data other than stock prices (exchange rate)

If you are a ticker that exists in Yahoo Finance, you can get data even if it is not a stock. Let's get currency exchange data as an example.

import pandas as pd

fxs = ["JPY=X", "EURUSD=X", "GBPUSD=X"]
tickers = yf.Tickers(" ".join(fxs))

closes = []
for i in range(len(tickers.tickers)):
    closes.append(tickers.tickers[i].history(period="max").Close)

df = pd.DataFrame(closes).T
df.columns = fxs

print(df)

Execution result

              JPY=X  EURUSD=X  GBPUSD=X
Date
1996-10-30  114.180       NaN       NaN
1996-11-01  113.500       NaN       NaN
1996-11-04  113.880       NaN       NaN
1996-11-05  114.250       NaN       NaN
1996-11-06  113.950       NaN       NaN
...             ...       ...       ...
2020-11-03  104.725    1.1643    1.2924
2020-11-04  104.546    1.1762    1.3122
2020-11-05  104.438    1.1733    1.2967
2020-11-06  103.603    1.1818    1.3139
2020-11-09  104.871    1.1910    1.3193

[6243 rows x 3 columns]

World Major Stock Index

The following is how to obtain the world's major stock indexes. There are other indicators that can be taken other than those listed here. We encourage you to search for it on Yahoo Finance yourself.

indices = ["^N225", "^DJI", "^GSPC", "^IXIC", "^GDAXI", "^FTSE", "^FCHI", "^HSI", "^SSEC", "^BVSP", "^KOSPI"]
#Omitted below

Execution of quantitative and empirical trading

Then, as an example, based on the program, we will explain how to verify the return when investing in stocks that are considered to be cheap and of high quality (PER is 10 times or less, ROE is 10% or more), which was mentioned in the problem raised in this article. I will explain. The programming ability of the author is less than that of students, so I think there are some unsightly descriptions. If you have any suggestions regarding coding, please comment.

Reading the stock list

First, prepare the TSE stock list in CSV format. You can get a list of listed stocks on the TSE from here, so please pick up the stocks you want to verify. In this example, we targeted the TOPIX 500 constituents, which are relatively large in TOPIX.

import datetime
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt

data = pd.read_csv("topix500.csv")
print(data)

Execution screen

     code
0    1332
1    1333
2    1414
3    1605
4    1721
..    ...
494  9962
495  9983
496  9984
497  9987
498  9989

[499 rows x 1 columns]

Ticker settings

Set the yfinance ticker. The Nikkei Stock Average, which is market data, is added to the above stocks.

stocks = [str(s)+".T" for s in data.code]
stocks.append("^N225")
tickers = yf.Tickers(" ".join(stocks))

Creating a closing price data frame

Next, get the historical data of the price series with yfinance, and summarize the closing price data in the data frame.

closes   = [] #closing price

for i in range(len(tickers.tickers)):
    closes.append(tickers.tickers[i].history(period="max").Close)

closes = pd.DataFrame(closes).T   #DataFrame conversion
closes.columns = stocks           #Column name setting
closes = closes.ffill()           #Completion of missing data

print(closes)

Execution screen

            1332.T  1333.T  1414.T  1605.T  1721.T  1801.T  ...  9962.T   9983.T  9984.T  9987.T  9989.T     ^N225
Date                                                        ...
1965-01-05     NaN     NaN     NaN     NaN     NaN     NaN  ...     NaN      NaN     NaN     NaN     NaN   1257.72
1965-01-06     NaN     NaN     NaN     NaN     NaN     NaN  ...     NaN      NaN     NaN     NaN     NaN   1263.99
1965-01-07     NaN     NaN     NaN     NaN     NaN     NaN  ...     NaN      NaN     NaN     NaN     NaN   1274.27
1965-01-08     NaN     NaN     NaN     NaN     NaN     NaN  ...     NaN      NaN     NaN     NaN     NaN   1286.43
1965-01-12     NaN     NaN     NaN     NaN     NaN     NaN  ...     NaN      NaN     NaN     NaN     NaN   1288.54
...            ...     ...     ...     ...     ...     ...  ...     ...      ...     ...     ...     ...       ...
2020-11-04   417.0  2240.0  5200.0   526.0  2789.0  3325.0  ...  3135.0  74380.0  6535.0  3865.0  3945.0  23695.23
2020-11-05   417.0  2211.0  5220.0   506.0  2782.0  3335.0  ...  3200.0  74400.0  6870.0  3965.0  4020.0  24105.28
2020-11-06   421.0  2219.0  5270.0   507.0  2826.0  3385.0  ...  3245.0  75480.0  6722.0  3785.0  4155.0  24325.23
2020-11-09   423.0  2252.0  5360.0   500.0  3010.0  3440.0  ...  3360.0  78310.0  7083.0  3770.0  4185.0  24839.84
2020-11-10   437.0  2319.0  5410.0   541.0  3040.0  3535.0  ...  3455.0  77910.0  6860.0  3865.0  4145.0  25108.21

[13862 rows x 500 columns]

Creating a net income data frame

Next, the financial statement data is summarized in a data frame. The first is net income for calculating PER and ROE. It seems that the NAN value is high because the fiscal year end of each brand is not aligned, but please be assured that the data is properly included in the necessary parts.

earnings = [] #Net income

dummy = tickers.tickers[0].financials.T["Net Income"]
dummy[:] = np.nan

for i in range(len(tickers.tickers)):
    try:
        earnings.append(tickers.tickers[i].financials.T["Net Income"])
    except:
        earnings.append(dummy)       #Insert a dummy when an error occurs

earnings = pd.DataFrame(earnings).T  #DataFrame conversion
earnings.columns = stocks            #Column name setting

print(earnings)

Execution screen

            1332.T  1333.T        1414.T  1605.T  1721.T  1801.T  ...  9962.T        9983.T  9984.T  9987.T  9989.T  ^N225
                                                                  ...
2006-08-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2007-08-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2009-03-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2010-03-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2011-03-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
...            ...     ...           ...     ...     ...     ...  ...     ...           ...     ...     ...     ...    ...
2020-05-20     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2020-05-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2020-06-30     NaN     NaN  9.005000e+09     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2020-08-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN  9.035700e+10     NaN     NaN     NaN    NaN
2020-09-30     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN

[69 rows x 500 columns]

Creating a capital data frame

Next is the equity capital for calculating ROE.

equity   = [] #net worth

dummy = tickers.tickers[0].balance_sheet.T["Total Stockholder Equity"]
dummy[:] = np.nan

for i in range(len(tickers.tickers)):
    try:
        equity.append(tickers.tickers[i].balance_sheet.T["Total Stockholder Equity"])
    except:
        equity.append(dummy)         #Insert a dummy when an error occurs

equity = pd.DataFrame(equity).T      #DataFrame conversion
equity.columns = stocks              #Column name setting

print(equity)

Execution screen

            1332.T  1333.T        1414.T  1605.T  1721.T  1801.T  ...  9962.T        9983.T  9984.T  9987.T  9989.T  ^N225
                                                                  ...
2006-08-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2007-08-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2009-03-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2010-03-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2011-03-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
...            ...     ...           ...     ...     ...     ...  ...     ...           ...     ...     ...     ...    ...
2020-05-20     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2020-05-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2020-06-30     NaN     NaN  8.359900e+10     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN
2020-08-31     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN  9.565620e+11     NaN     NaN     NaN    NaN
2020-09-30     NaN     NaN           NaN     NaN     NaN     NaN  ...     NaN           NaN     NaN     NaN     NaN    NaN

[69 rows x 500 columns]

Creating a number of issued shares data frame

EPS (earnings per share) is required to calculate PER. Create a share issue data frame to calculate EPS.

shares   = [] #Number of issued shares

for i in range(len(tickers.tickers)):
    try:
        shares.append(tickers.tickers[i].info["sharesOutstanding"])
    except:
        shares.append(np.nan)        #Enter the NAN value when an error occurs

shares = pd.Series(shares)           #Series
shares.index = stocks                #Index name setting

print(shares)

Execution screen

1332.T    3.111410e+08
1333.T    5.262460e+07
1414.T    5.382810e+07
1605.T    1.460200e+09
1721.T    1.260270e+08
              ...
9983.T    1.020820e+08
9984.T             NaN
9987.T    8.917480e+07
9989.T    1.169000e+08
^N225              NaN
Length: 500, dtype: float64

Creating EPS and ROE data frames

Create EPS and ROE data frames from data on net income, equity capital, and number of shares issued.

eps = earnings/shares.values      # EPS
roe = earnings/equity             # ROE

eps = eps.ffill()                 #Completion of missing data
roe = roe.ffill()

eps = eps.drop(["^N225"], axis=1) # ^Delete the N225 column
roe = roe.drop(["^N225"], axis=1)

print(eps)
print(roe)

Execution screen

               1332.T     1333.T      1414.T      1605.T  ...       9983.T  9984.T      9987.T      9989.T
                                                          ...
2006-08-31        NaN        NaN         NaN         NaN  ...          NaN     NaN         NaN         NaN
2007-08-31        NaN        NaN         NaN         NaN  ...          NaN     NaN         NaN         NaN
2009-03-31        NaN        NaN         NaN         NaN  ...          NaN     NaN         NaN         NaN
2010-03-31        NaN        NaN         NaN         NaN  ...          NaN     NaN         NaN         NaN
2011-03-31        NaN        NaN         NaN         NaN  ...          NaN     NaN         NaN         NaN
...               ...        ...         ...         ...  ...          ...     ...         ...         ...
2020-05-20  47.464013  238.23459  150.107472  107.557873  ...  1592.621618     NaN  316.378618  202.668948
2020-05-31  47.464013  238.23459  150.107472  107.557873  ...  1592.621618     NaN  316.378618  202.668948
2020-06-30  47.464013  238.23459  167.291805  107.557873  ...  1592.621618     NaN  316.378618  202.668948
2020-08-31  47.464013  238.23459  167.291805  107.557873  ...   885.141357     NaN  316.378618  202.668948
2020-09-30  47.464013  238.23459  167.291805  107.557873  ...   885.141357     NaN  316.378618  202.668948

[69 rows x 499 columns]

              1332.T    1333.T    1414.T   1605.T   1721.T  ...    9962.T    9983.T  9984.T    9987.T    9989.T
                                                            ...
2006-08-31       NaN       NaN       NaN      NaN      NaN  ...       NaN       NaN     NaN       NaN       NaN
2007-08-31       NaN       NaN       NaN      NaN      NaN  ...       NaN       NaN     NaN       NaN       NaN
2009-03-31       NaN       NaN       NaN      NaN      NaN  ...       NaN       NaN     NaN       NaN       NaN
2010-03-31       NaN       NaN       NaN      NaN      NaN  ...       NaN       NaN     NaN       NaN       NaN
2011-03-31       NaN       NaN       NaN      NaN      NaN  ...       NaN       NaN     NaN       NaN       NaN
...              ...       ...       ...      ...      ...  ...       ...       ...     ...       ...       ...
2020-05-20  0.096428  0.094528  0.103502  0.05165  0.08434  ...  0.078191  0.173209     NaN  0.068505  0.126817
2020-05-31  0.096428  0.094528  0.103502  0.05165  0.08434  ...  0.078191  0.173209     NaN  0.068505  0.126817
2020-06-30  0.096428  0.094528  0.107717  0.05165  0.08434  ...  0.078191  0.173209     NaN  0.068505  0.126817
2020-08-31  0.096428  0.094528  0.107717  0.05165  0.08434  ...  0.078191  0.094460     NaN  0.068505  0.126817
2020-09-30  0.096428  0.094528  0.107717  0.05165  0.08434  ...  0.078191  0.094460     NaN  0.068505  0.126817

[69 rows x 499 columns]

Formatting the closing price data frame and creating the monthly return data frame

From here, we will format the data by pushing it. First, format the data for monthly data, and then create a monthly return data frame (minus market returns).

closes["month"] = closes.index.month                                      #Creating a month column
closes["end_of_month"] = closes.month.diff().shift(-1)                    #Creating a month-end flag column
closes = closes[closes.end_of_month != 0]                                 #Extracted only at the end of the month

monthly_rt = closes.pct_change().shift(-1)                                #Creating monthly returns(With lag)
monthly_rt = monthly_rt.sub(monthly_rt["^N225"], axis=0)                  #Market return deduction

closes = closes[closes.index > datetime.datetime(2017, 4, 1)]             #After April 2017
monthly_rt = monthly_rt[monthly_rt.index > datetime.datetime(2017, 4, 1)]

closes = closes.drop(["^N225", "month", "end_of_month"], axis=1)          #Delete unnecessary columns
monthly_rt = monthly_rt.drop(["^N225", "month", "end_of_month"], axis=1)

print(closes)
print(monthly_rt)

Execution screen

            1332.T   1333.T   1414.T   1605.T   1721.T   1801.T  ...   9861.T   9962.T    9983.T   9984.T   9987.T   9989.T
Date                                                             ...
2017-04-28  511.60  3064.04  2390.82   994.50  1964.87  3873.35  ...  1758.93  2063.02  35232.05  4138.08  3573.51  3686.69
2017-05-31  550.67  3049.61  2498.64   947.96  2172.28  4310.81  ...  1730.92  2443.18  35949.09  4413.07  3529.87  4063.84
2017-06-30  625.93  2855.29  2687.68  1006.13  2141.72  4675.36  ...  1810.12  2507.68  36259.17  4459.15  3617.15  3950.69
2017-07-31  613.54  2895.69  2763.53   998.68  2094.50  4812.06  ...  1801.43  2673.82  32092.56  4391.01  3573.51  3875.26
2017-08-31  589.73  3068.85  2886.77   978.21  2188.95  5026.24  ...  1818.65  2756.89  30664.60  4373.37  3883.83  4294.85

2020-07-31  435.19  2021.00  4530.00   599.10  3058.94  3557.00  ...  1791.22  2489.70  55837.62  6595.00  3713.65  3580.46
2020-08-31  471.87  2398.00  5010.00   673.80  2922.77  3601.22  ...  2098.00  2777.20  63280.00  6598.00  3907.01  3912.72
2020-09-30  447.00  2412.00  5220.00   563.50  2921.00  3550.00  ...  1970.00  2935.00  65860.00  6469.00  4005.00  3965.00
2020-10-30  401.00  2182.00  5020.00   492.00  2646.00  3245.00  ...  1915.00  3090.00  72710.00  6793.00  3765.00  3875.00
2020-11-10  437.00  2319.00  5410.00   541.00  3040.00  3535.00  ...  1998.00  3455.00  77910.00  6860.00  3865.00  4145.00

[44 rows x 499 columns]

              1332.T    1333.T    1414.T    1605.T    1721.T  ...    9962.T    9983.T    9984.T    9987.T    9989.T
Date                                                          ...
2017-04-28  0.052727 -0.028350  0.021457 -0.070438  0.081918  ...  0.160633 -0.003289  0.042813 -0.035853  0.078659
2017-05-31  0.117186 -0.083203  0.056174  0.041880 -0.033552  ...  0.006917 -0.010858 -0.009042  0.005243 -0.047327
2017-06-30 -0.014391  0.019553  0.033625 -0.002001 -0.016644  ...  0.071656 -0.109508 -0.009877 -0.006661 -0.013689
2017-07-31 -0.024808  0.073799  0.058595 -0.006498  0.059094  ...  0.045067 -0.030496  0.009982  0.100838  0.122273
2017-08-31 -0.013368  0.001479  0.016405  0.109973  0.112144  ...  0.018384  0.018514 -0.015500 -0.030392 -0.007134

2020-07-31  0.018428  0.120684  0.040103  0.058830 -0.110373  ...  0.049619  0.067429 -0.065402 -0.013790  0.026941
2020-08-31 -0.054665  0.003878  0.039956 -0.165659 -0.002566  ...  0.054860  0.038811 -0.021512  0.023120  0.011401
2020-09-30 -0.093937 -0.086386 -0.029343 -0.117915 -0.085175  ...  0.061782  0.112979  0.059056 -0.050954 -0.013728
2020-10-30  0.013155 -0.025898 -0.022349  0.013459  0.066692  ...  0.021076 -0.026613 -0.090432 -0.056213 -0.031199
2020-11-10       NaN       NaN       NaN       NaN       NaN  ...       NaN       NaN       NaN       NaN       NaN

[44 rows x 499 columns]

Creation of PER and ROE data frames (same dimension as monthly returns)

Finally, create a PER and ROE data frame so that the monthly returns are in the same dimension.

eps_df = pd.DataFrame(index=monthly_rt.index, columns=monthly_rt.columns) #DF creation of the same dimension as monthly return
roe_df = pd.DataFrame(index=monthly_rt.index, columns=monthly_rt.columns)

for i in range(len(eps_df)):                                              #Substitution to each line
    eps_df.iloc[i] = eps[eps.index < eps_df.index[i]].iloc[-1]

for i in range(len(roe_df)):
    roe_df.iloc[i] = roe[roe.index < roe_df.index[i]].iloc[-1]

per_df = closes/eps_df                                                    #Creating a PER data frame

print(per_df)
print(roe_df)

Execution screen

             1332.T   1333.T   1414.T   1605.T   1721.T 1801.T  ...   9861.T   9962.T   9983.T 9984.T   9987.T   9989.T
Date                                                            ...
2017-04-28  11.1972  10.4392      NaN   31.454  17.0954    NaN  ...  91.0625  31.8555      NaN    NaN  14.9553  18.4872
2017-05-31  12.0523    10.39      NaN   29.982     18.9    NaN  ...  89.6124  37.7256      NaN    NaN  14.7726  20.3785
2017-06-30  13.6995  9.72799      NaN  31.8218  18.6341    NaN  ...  93.7127  38.7215      NaN    NaN  15.1379  19.8111
2017-07-31  13.4284  9.86563  21.2599  31.5862  18.2232    NaN  ...  93.2628  41.2869      NaN    NaN  14.9553  19.4328
2017-08-31  12.9072  10.4556   22.208  30.9388   19.045    NaN  ...  94.1543  42.5696      NaN    NaN   16.254  21.5369

2020-07-31  9.16884  8.48323  27.0784  5.57002  14.8307    NaN  ...  162.317  42.8301  35.0602    NaN   11.738  17.6665
2020-08-31  9.94164  10.0657  29.9477  6.26453  14.1705    NaN  ...  190.117  47.7759  39.7332    NaN  12.3492   19.306
2020-09-30  9.41766  10.1245   31.203  5.23904  14.1619    NaN  ...  178.518  50.4906  74.4062    NaN  12.6589  19.5639
2020-10-30  8.44851  9.15904  30.0074  4.57428  12.8286    NaN  ...  173.534   53.157  82.1451    NaN  11.9003  19.1199
2020-11-10  9.20698   9.7341  32.3387  5.02985  14.7389    NaN  ...  181.056  59.4361  88.0198    NaN  12.2164  20.4521

[44 rows x 499 columns]

               1332.T     1333.T     1414.T     1605.T     1721.T  ...     9962.T     9983.T 9984.T     9987.T    9989.T
Date                                                               ...
2017-04-28   0.117515   0.153443        NaN  0.0156865   0.071603  ...    0.11847        NaN    NaN  0.0538158  0.170991
2017-05-31   0.117515   0.153443        NaN  0.0156865   0.071603  ...    0.11847        NaN    NaN  0.0538158  0.170991
2017-06-30   0.117515   0.153443        NaN  0.0156865   0.071603  ...    0.11847        NaN    NaN  0.0538158  0.170991
2017-07-31   0.117515   0.153443   0.101048  0.0156865   0.071603  ...    0.11847        NaN    NaN  0.0538158  0.170991
2017-08-31   0.117515   0.153443   0.101048  0.0156865   0.071603  ...    0.11847        NaN    NaN  0.0538158  0.170991

2020-07-31  0.0964277  0.0945276   0.107717    0.05165  0.0843397  ...  0.0781906   0.173209    NaN  0.0685051  0.126817
2020-08-31  0.0964277  0.0945276   0.107717    0.05165  0.0843397  ...  0.0781906   0.173209    NaN  0.0685051  0.126817
2020-09-30  0.0964277  0.0945276   0.107717    0.05165  0.0843397  ...  0.0781906  0.0944602    NaN  0.0685051  0.126817
2020-10-30  0.0964277  0.0945276   0.107717    0.05165  0.0843397  ...  0.0781906  0.0944602    NaN  0.0685051  0.126817
2020-11-10  0.0964277  0.0945276   0.107717    0.05165  0.0843397  ...  0.0781906  0.0944602    NaN  0.0685051  0.126817

[44 rows x 499 columns]

Data combination

Finally, let's combine these data frames into one.

stack_monthly_rt = monthly_rt.stack()                                  #Stack in one dimension
stack_per_df = per_df.stack()
stack_roe_df = roe_df.stack()

df = pd.concat([stack_monthly_rt, stack_per_df, stack_roe_df], axis=1) #Join
df.columns = ["rt", "per", "roe"]                                      #Column name setting

df["rt"][df.rt > 1.0] = np.nan                                         #Removal of outliers

print(df)

Execution screen

                         rt      per        roe
Date
2017-04-28 1332.T -0.047638  11.1972   0.117515
           1333.T -0.070101  10.4392   0.153443
           1414.T  0.026680      NaN        NaN
           1605.T -0.038959   31.454  0.0156865
           1721.T  0.051664  17.0954   0.071603
...                     ...      ...        ...
2020-11-10 9962.T  0.025375  59.4361  0.0781906
           9983.T -0.021231  88.0198  0.0944602
           9984.T -0.082885      NaN        NaN
           9987.T -0.066187  12.2164  0.0685051
           9989.T -0.023070  20.4521   0.126817

[21892 rows x 3 columns]

Extraction and plotting of target stocks

Now, let's extract the stocks (PER <10, ROE> 0.1) that are considered to be cheap and of high quality, and observe the distribution map of returns and the cumulative returns when buying and selling these.

By analyzing the data in this way, it is clear that individual stock selection using tools such as those introduced in magazines is completely unreliable. This verification is just a deduction of market returns, so you may be getting some returns when the market is booming.

value_df = df[(df.per < 10) & (df.roe > 0.1)]       #Extract cheap and high quality brands

plt.hist(value_df["rt"])                            #Histogram drawing
plt.show()

balance = value_df.groupby(level=0).mean().cumsum() #Create cumulative returns

plt.clf()
plt.plot(balance["rt"])                             #Drawing the balance curve
plt.show()

Execution result

Finally

The verification presented in this article is still just the beginning of quantitative and empirical trading. Although this article is labeled "Successful Investment", few people actually read this article and succeed in investing. That's because the vast majority of people who read the article don't take action. The action here is not just to read this article and trade the burning blade, but to verify the quantitative and empirical trading by ingenuity based on the inspiration received from this article. It means to reach.

While this article is as straightforward as possible, readers can never understand the details and useful insights that have arisen in the process by simply reading the article. The only way to make this your own is to move your hands.

I have written my knowledge about trading on blogs and notes. If you've read this article and are serious about quantitative and empirical trading, the following references are sure to guide you.

Again, most trading is determined by luck. The reason I've done well so far is because I'm lucky that the performance hasn't fallen. May you also have good luck.

Reference article

-Statistical Modeling of Trading -History and Prospects of Trading Bias Countermeasure Technology -Recommendation for AI investment -Recommendation of Systre -Investment index search procedure -Machine learning stock price forecast Iroha no "i" -Machine learning stock price forecast Iroha no "ro"

[PYTHON] Successful Investment: Trading Science

Introduction

Purpose of this article

Writer's investment performance

Problem presentation

Investment is like a gacha

Solution

Features of Gacha stand

Anyway, keep spinning the gacha many times

If you keep pulling the gacha, you will run out of money

Estimate the contents of the gacha stand in advance from the data

And to operation

Preparation for quantitative and empirical trading

Price historical data

Profit and loss statement

Balance sheet (balance sheet)

Cash flow statement

Stock summary

Acquisition of multiple stocks

Acquisition of data other than stock prices (exchange rate)

World Major Stock Index

Execution of quantitative and empirical trading

Reading the stock list

Ticker settings

Creating a closing price data frame

Creating a net income data frame

Creating a capital data frame

Creating a number of issued shares data frame

Creating EPS and ROE data frames

Formatting the closing price data frame and creating the monthly return data frame

Creation of PER and ROE data frames (same dimension as monthly returns)

Data combination

Extraction and plotting of target stocks

Finally

Reference article