Immediately, are you all investing? If so, what kind of investment are you making?
There are various investment targets in the world, and there are various investment methods. There is a lot of information about investment in the world. It may be better to say that it is flooding. Taking books as an example, there is an array of books, from light introductory books that encourage investment with sweet words to specialized financial books. Blogs and SNS are also important sources of information, and YouTube for investment seems to be gaining popularity these days.
But despite the variety of sources of information, only a handful of successful investments can be made. It's a bit old research, but according to Nomura Securities' individual investor research in 2015, the percentage of individual investors who are making a total profit is 9.3%. Why do you fall into such a situation? Let me give you some examples of the approach that beginners in investment take.
――Jump to the soaring price of trendy theme stocks and grab the high price --Imitate Buffett and try to select cheap growth stocks from the four seasons report --Try screening stocks by PER or ROE based on the knowledge you have heard ――Try to hold a stock for dividends --Try to buy the brand recommended by the magazine analyst ――Try trading according to the announcement of corporate financial results and economic indicators --Buy a trendy ESG company --Use technical to predict whether the chart will rise or fall
Some people may be disappointed to see the unrealized loss every time the lunch break of working hours comes because they trade such a method swallowingly. Again, why do we end up in this situation? That's because ** beginners don't know what trading styles are inherently superior ** in the first place.
In this article, I have kept in mind that it will give a completely new awareness to the investment outlook that beginners have, both in the rabbit and in the corner. Trading is science. Scientific trading is, in other words, quantitative and empirical trading.
In this article, we will show you what trading styles dominate, based on statistical thinking. In the article, I chose terms that are as easy to understand as possible for beginners to read easily, and tried to omit difficult mathematical formulas and technical considerations as much as possible.
It also teaches you specific techniques for putting scientific trading into practice. Programming is essential for quantitative and empirical trading. In this article, I wrote a program to perform the minimum required verification using the language Python. Many people will think of it as programming, but it's a good idea to take this opportunity to start studying. Working with a sense of purpose is the fastest way to improve.
At the beginning of the article, I would like to introduce the performance of my investment so far. Suddenly you might think it's ugly to talk about the money you've made, but it's unconvincing to say anything to someone who hasn't produced results. One thing to add is that the most important thing in investing is process management, not results. This is because, as we will see later, investment performance is highly dependent on luck, and making decisions based solely on results often leads to false conclusions.
Originally, I was investing alongside a professional engineer, but in 2014 I started full-scale asset management. The startup fund at that time was 50 million yen. In 2016, we developed the operation system that is currently the mainstay. This operation system is our flagship system that has earned about 140 million yen from the start of operation to the present. The investment is targeted at large-scale Japanese stocks called TOPIX500, and the average yield for the last four years has been around 40%.
Now, I will explain why the beginner's approach mentioned at the beginning cannot succeed and what is the problem. Success here means to obtain the desired yield continuously and stably without any setbacks during the investment period of at least 5 years.
First, let's consider whether the method at the beginning will make a profit or not. As an example, consider investing in a stock that is considered cheap and of high quality. When selecting individual stocks, you will most likely screen them using brokerage tools. Let's take a look at the performance when investing in a stock with a PER of 10 times or less and an ROE of 10% or more for one month.
The aggregation period is from April 2017 to October 2020, and the profit earned when the screened stock is purchased at the beginning of the month (to be exact, purchased at the closing price of the previous month) and settled at the closing price of the end of the month. It is a distribution map (histogram). At this time, the market return is subtracted from the return of individual stocks. If you do not do this, you will be affected by the sharp rise in the Nikkei average, and you will not be able to distinguish whether the profits you have earned depend on your stock selection skills or if you are lucky enough to benefit from the rise in the market.
Well, is this investment method profitable?
In conclusion, it may or may not be profitable. The stock you have invested in may be the far right (that is, the stock that made a profit) in the distribution shown above. It may also be the leftmost part of the distribution (that is, the stock that lost money). All of the stocks included in the distribution shown above are "cheap and high quality" stocks by your standards. It is not the fact that "cheap and high quality" determines the final profit and loss from this, but it can be said that it is your luck to pick up the stock from the many target stocks at that time. I will. For example, I happened to like the name of the company, it happened to appear at the top of the screening, I read a magazine and suddenly thought of investing, or something like that. Let's do it.
The outcome of your investment depends on your luck. Rather, it's mostly determined by luck. Luck here means the probability distribution as shown in the above figure. As you can see from the above example, if you focus only on the result of the investment, you can never judge whether the investment itself was successful. It makes no sense whether the outcome of your trade is a product of your own skills or a coincidence.
The inability to judge the results correctly in this way means that we do not know whether we can rely on the method. In other words, you cannot go through the process of improvement in business. An approach like the one at the beginning lacks the idea of turning the improvement process around. In order to run the improvement process, we need a mechanism to feed back quantitative results in a short period of time.
If you can't go through the improvement process, your investment skills will never improve in the future. This is the same as spinning a gacha whose contents are unknown. You're just repeating the barren act of pulling a few gachas out of the above emission distribution several times a year, and if the results are good, you're happy and if you're bad, you're discouraged.
So is there a way to capture this gacha? Naturally, the answer is "YES".
There are good and bad things in the gacha stand. A table with a lot of hits inside is a good table, and a table with only scraps is a bad table. What does it look like when you look at these platforms in terms of emission distribution? Here are three examples:
First of all, how about the leftmost stand? The center of the emission distribution of this platform is almost 0. That is, the expected value of this stand is 0. No matter how much you draw gacha on this platform, your assets will not grow. Even if you make a temporary profit, it just happens that your performance is outperforming. In the long run, the average profit and loss earned will always approach zero.
Then how about the middle platform? If you look closely at the emission distribution of this platform, you can see that it has shifted slightly to the right (plus side). This is the state where the "expected value is positive". In trading, we use expressions such as "has an edge" and "has an alpha (excess profit)". By continuing to pull the gacha, you can grow your assets on this platform. However, the ups and downs of assets along the way are volatile, and you may stop halfway through.
Finally, it is the rightmost stand. The emission distribution of this platform is slightly shifted to the right, similar to the platform in the middle. And if you take a closer look, you can see that the variation (spreading from the center) is within a small range compared to the platform in the middle. It is a platform with a high expected value and a small variation, that is, a high "Sharpe ratio". You can find such a gacha stand, and by continuing to pull it, your success will be solid.
It is not possible to grasp the tendency of what the contents of the gacha stand look like by pulling the gacha once or twice. Therefore, if you really want to grasp the tendency of the contents, you have to keep turning it over and over again. The important thing here is to keep turning only one gacha without turning aside. If you pull it halfway and try to replace it with this gacha because it seems that this gacha won't hit you very much, you can't grasp the tendency even if it takes a lifetime.
In other words, in trading, you have to continue trading consistently based on only one method. That is not enough 10 times or 100 times. It has to be enforced many times, exceeding 1000 times. Don't shift from one investment method to another, such as in magazines. The only way to combat the uncertainty of investment is to earn trials. The only weapon we have is the law of large numbers.
Therefore, ** a dominant trading strategy is a “manageable” strategy **. Trades made by individual investors that span more than a month are not suitable. At least trading should be done daily. The best of these is scalping. Also, the shorter the trading period, the less uncertainty about price movements and the easier it is to grasp the trend (specifically, investment uncertainty is proportional to the square root of the investment period).
However, as a matter of course, you need money to draw a gacha. The more you pull, the more money will disappear from your wallet. This is the same for trading. In trading, one transaction always costs money. If it is a stock, you will be paying the commission and credit interest rate when purchasing the stock, and if it is FX, you will be paying the spread to the trader in one trade. If you trade many times, your funds will run out in no time.
To be honest, estimating the tendency of the contents of the gacha while throwing live ammunition is an act that is completely unprofitable. Rather than doing such a thing, there is a way to hit a good gacha stand in advance.
In the investment world, we can collect some of the data previously used on the Gacha stand. And you can use this data to verify what kind of gacha stand is good, that is, what kind of trading strategy you should bet on before drawing gacha. This is a trading method called quantitative and empirical. Data analysis is used to find the expected value that can overcome the transaction costs.
Research on quantitative and empirical trading has been conducted for many years. Especially recently, there are many cases where machine learning is used for this. Machine learning is very useful not only for confirming the statistical significance of data, but also for extracting hidden market characteristics that no one knows. The programming library is extensive, and APIs for downloading data are being developed. Even individual investors can fully analyze the data.
In actual operation, it is necessary to fix the conditions as much as possible based on the verification result and trade mechanically. Don't put human emotions in it. And if you mechanically repeat a lot of trades, there is no reason not to automate this. Such a trading style is also collectively referred to as "system trading".
In practice, a successful investment approach is to go through the process of improvement while comparing pre-verification (ie backtesting) with actual operational performance. For those who have never analyzed data, the following chapters will show you how to analyze data using Python.
Now, let's collect the data first. Data is collected using the Yahoo Finance API. At this time, a library called yfinance is used, so install it first. For more information on yfinance, please refer to here.
pip install yfinance
(Addition) About the bug of yfinance There is a bug in yfinance and you cannot get financial statement data by default. You can get it by modifying the base.py file of yfinance as follows.
# base.py Near line 353
# get fundamentals
# data = utils.get_json(url+'/financials', proxy)← Default program. Mask
url = "{}/{}/financials".format(self._scrape_url, self.ticker) #add to
data = utils.get_json(url, proxy) #add to
First, let's download price data for stocks on the Japanese stock market.
By executing the following code, you can get the historical data of 7203 Toyota Motor in an instant. Since the ticker symbol of Japanese market stocks is represented by the securities code + ".T", you can easily obtain the data of almost all stocks without looking up the Yahoo Finance symbol.
import yfinance as yf
ticker = yf.Ticker("7203.T")
hist = ticker.history(period="max")
print(hist)
Execution screen
Open High Low Close Volume Dividends Stock Splits
Date
1999-05-06 2259.74 2337.44 2233.84 2337.44 3115000 0.0 0
1999-05-07 2324.49 2330.96 2233.84 2253.27 3033000 0.0 0
1999-05-10 2253.27 2279.16 2233.84 2246.79 1261000 0.0 0
1999-05-11 2266.22 2279.17 2227.37 2227.37 1686000 0.0 0
1999-05-12 2227.37 2266.21 2227.37 2266.21 2596000 0.0 0
... ... ... ... ... ... ... ...
2020-11-02 6866.00 7016.00 6850.00 6949.00 5721200 0.0 0
2020-11-04 7024.00 7054.00 6976.00 6976.00 6278100 0.0 0
2020-11-05 6955.00 7032.00 6923.00 6984.00 5643400 0.0 0
2020-11-06 7070.00 7152.00 7015.00 7019.00 11092900 0.0 0
2020-11-09 7159.00 7242.00 7119.00 7173.00 7838600 0.0 0
[5324 rows x 7 columns]
Next, let's look at the data in the financial statements. First of all, from the income statement.
You can get the income statement for the last 3 years from the code below. Of these, the most important are Total Revenue (sales), Operating Income (operating income), and Net Income (net income).
financials = ticker.financials
print(financials)
Execution screen
2020-03-31 2019-03-31 2018-03-31 2017-03-31
Research Development None None None None
Effect Of Accounting Charges None None None None
Income Before Tax 2.82576e+12 2.64553e+12 3.09051e+12 2.55588e+12
Minority Interest 6.77064e+11 7.18985e+11 6.9412e+11 6.68264e+11
Net Income 2.07618e+12 1.88287e+12 2.49398e+12 1.83111e+12
Selling General Administrative 2.97317e+12 2.9867e+12 3.0905e+12 2.86848e+12
Gross Profit 5.40763e+12 5.4439e+12 5.49036e+12 4.86286e+12
Ebit 2.43446e+12 2.4572e+12 2.39986e+12 1.99437e+12
Operating Income 2.43446e+12 2.4572e+12 2.39986e+12 1.99437e+12
Other Operating Expenses None None None None
Interest Expense -3.2217e+10 -2.8078e+10 -2.7586e+10 -2.9353e+10
Extraordinary Items None None None None
Non Recurring None None None None
Other Items None None None None
Income Tax Expense 6.8343e+11 6.59944e+11 5.04406e+11 6.289e+11
Total Revenue 2.993e+13 3.02257e+13 2.93795e+13 2.75972e+13
Total Operating Expenses 2.74955e+13 2.77685e+13 2.69796e+13 2.56028e+13
Cost Of Revenue 2.45224e+13 2.47818e+13 2.38892e+13 2.27343e+13
Total Other Income Expense Net 3.91297e+11 1.8833e+11 6.9065e+11 5.61513e+11
Discontinued Operations None None None None
Net Income From Continuing Ops 2.14233e+12 1.98559e+12 2.58611e+12 1.92698e+12
Net Income Applicable To Common Shares 2.0589e+12 1.86808e+12 2.48169e+12 1.82131e+12
Next is the balance sheet.
You can get the balance sheet for the last 3 years from the code below. Of these, the most important are Total Assets, Total Liab, and Total Stockholder Equity.
balance_sheet = ticker.balance_sheet
print(balance_sheet)
Execution screen
2020-03-31 2019-03-31 2018-03-31 2017-03-31
Capital Surplus 4.893340e+11 4.871620e+11 4.875020e+11 4.840130e+11
Total Liab 3.194275e+13 3.186981e+13 3.087815e+13 3.056711e+13
Total Stockholder Equity 2.006062e+13 1.934815e+13 1.873598e+13 1.751481e+13
Minority Interest 6.770640e+11 7.189850e+11 6.941200e+11 6.682640e+11
Other Current Liab 4.102642e+12 4.479344e+12 4.399669e+12 3.979935e+12
Total Assets 5.268044e+13 5.193695e+13 5.030825e+13 4.875019e+13
Common Stock 3.970500e+11 3.970500e+11 3.970500e+11 3.970500e+11
Other Current Assets 2.469880e+11 1.425310e+11 2.022920e+11 1.235700e+10
Retained Earnings 2.342761e+13 2.198752e+13 1.947346e+13 1.760107e+13
Other Liab 2.746823e+12 2.887743e+12 2.902003e+12 3.163780e+12
Treasury Stock -4.253379e+12 -3.523575e+12 -1.622034e+12 -9.673210e+11
Other Assets 8.905140e+11 1.182809e+12 1.067759e+12 1.012639e+12
Cash 2.774498e+12 2.790212e+12 2.390524e+12 2.257064e+12
Total Current Liabilities 1.790238e+13 1.822694e+13 1.779689e+13 1.731896e+13
Deferred Long Term Asset Charges 3.547850e+11 5.018720e+11 4.941200e+11 5.039850e+11
Short Long Term Debt 1.418710e+11 1.560380e+11 1.674550e+11 2.285990e+11
Other Stockholder Equity -1.166273e+12 -9.166500e+11 4.356990e+11 6.409220e+11
Property Plant Equipment 1.087864e+13 1.068549e+13 1.026767e+13 1.019711e+13
Total Current Assets 1.864253e+13 1.887924e+13 1.815266e+13 1.783370e+13
Long Term Investments 1.184489e+13 1.090829e+13 1.133854e+13 1.069452e+13
Net Tangible Assets 2.006062e+13 1.934815e+13 1.873598e+13 1.751481e+13
Short Term Investments 1.477202e+12 2.234892e+12 2.447703e+12 2.522598e+12
Net Receivables 2.659748e+12 2.940890e+12 2.708900e+12 2.552805e+12
Long Term Debt 1.029678e+12 7.655860e+11 5.910860e+11 5.784750e+11
Inventory 2.434918e+12 2.656396e+12 2.539789e+12 2.388617e+12
Accounts Payable 2.434180e+12 2.645984e+12 2.586657e+12 2.566382e+12
At the end of the financial statements is the cash flow statement.
You can get the cash flow statement for the last 3 years from the code below. Of these, the most important are Total Cashflows From Operating Activities, Total Cashflows From Financing Activities, and Total Cashflows From Investing Activities.
cashflow = ticker.cashflow
print(cashflow)
Execution screen
2020-03-31 2019-03-31 2018-03-31 2017-03-31
Investments 2.334300e+11 6.166420e+11 -3.322730e+11 6.950000e+08
Change To Liabilities -7.641000e+10 9.488700e+10 4.664800e+10 1.459570e+11
Total Cashflows From Investing Activities -3.150861e+12 -2.697241e+12 -3.660092e+12 -2.969939e+12
Net Borrowings 1.558199e+12 7.229710e+11 6.893390e+11 1.030929e+12
Total Cash From Financing Activities 3.971380e+11 -5.408390e+11 -4.491350e+11 -3.751650e+11
Change To Operating Activities -2.703900e+11 4.084000e+11 4.857250e+11 7.724320e+11
Net Income 2.076183e+12 1.882873e+12 2.493983e+12 1.831109e+12
Change In Cash 7.056750e+11 4.868760e+11 7.031300e+10 2.098980e+11
Repurchase Of Stock -4.761290e+11 -5.496370e+11 -4.478180e+11 -7.039860e+11
Effect Of Exchange Rate -1.312450e+11 -4.164100e+10 -4.358800e+10 -1.348600e+10
Total Cash From Operating Activities 3.590643e+12 3.766597e+12 4.223128e+12 3.568488e+12
Depreciation 1.605383e+12 1.792375e+12 1.734033e+12 1.610950e+12
Dividends Paid -6.299870e+11 -6.448060e+11 -6.268920e+11 -6.381720e+11
Change To Inventory -1.140960e+11 -1.669020e+11 -1.711480e+11 -2.463260e+11
Change To Account Receivables 2.488950e+11 -2.468450e+11 -1.054350e+11 -2.647840e+11
Other Cashflows From Financing Activities -5.494500e+10 -6.936700e+10 -6.376400e+10 -6.393600e+10
Change To Netincome 2.228170e+11 1.431380e+11 -4.994310e+11 -1.598180e+11
Capital Expenditures -3.595131e+12 -3.738887e+12 -3.598707e+12 -3.541437e+12
Finally, how to get a summary of stocks.
You can get the basic information of the stock from the code below. Of these, the most important are marketcap, sharesOutstanding, forwardPE (forecast PER), dividendYield (dividend yield), profitMargins (net profit ratio) and much more.
info = ticker.info
print(info)
Execution screen
Omitted because it is a dictionary type
If you want to get multiple stocks at the same time, use the Tickers class and separate the arguments with a space.
tickers = yf.Tickers("7203.T 9984.T 6861.T")
hists = []
for i in range(len(tickers.tickers)):
hists.append(tickers.tickers[i].history())
print(hists[0])
Execution screen
Open High Low Close Volume Dividends Stock Splits
Date
2020-10-09 7026.0 7029.0 6947.0 6967.0 3395900 0 0
2020-10-12 6932.0 6945.0 6900.0 6911.0 2638200 0 0
2020-10-13 6977.0 7030.0 6946.0 7030.0 3667700 0 0
2020-10-14 6962.0 6970.0 6919.0 6935.0 3065400 0 0
2020-10-15 6898.0 6933.0 6895.0 6915.0 2844800 0 0
2020-10-16 6940.0 6944.0 6825.0 6829.0 3770200 0 0
2020-10-19 6874.0 6948.0 6870.0 6945.0 3047000 0 0
2020-10-20 6926.0 6945.0 6889.0 6897.0 2342400 0 0
2020-10-21 6962.0 7052.0 6956.0 7009.0 4795000 0 0
2020-10-22 6967.0 6984.0 6941.0 6966.0 3207500 0 0
2020-10-23 7009.0 7010.0 6944.0 6973.0 3963300 0 0
2020-10-26 6970.0 7003.0 6955.0 6990.0 2675000 0 0
2020-10-27 6970.0 6993.0 6924.0 6961.0 3234300 0 0
2020-10-28 6888.0 6927.0 6845.0 6895.0 3760200 0 0
2020-10-29 6795.0 6924.0 6780.0 6893.0 4099900 0 0
2020-10-30 6848.0 6878.0 6803.0 6803.0 5207800 0 0
2020-11-02 6866.0 7016.0 6850.0 6949.0 5721200 0 0
2020-11-04 7024.0 7054.0 6976.0 6976.0 6278100 0 0
2020-11-05 6955.0 7032.0 6923.0 6984.0 5643400 0 0
2020-11-06 7070.0 7152.0 7015.0 7019.0 11092900 0 0
2020-11-09 7159.0 7242.0 7119.0 7173.0 7838600 0 0
If you are a ticker that exists in Yahoo Finance, you can get data even if it is not a stock. Let's get currency exchange data as an example.
import pandas as pd
fxs = ["JPY=X", "EURUSD=X", "GBPUSD=X"]
tickers = yf.Tickers(" ".join(fxs))
closes = []
for i in range(len(tickers.tickers)):
closes.append(tickers.tickers[i].history(period="max").Close)
df = pd.DataFrame(closes).T
df.columns = fxs
print(df)
Execution result
JPY=X EURUSD=X GBPUSD=X
Date
1996-10-30 114.180 NaN NaN
1996-11-01 113.500 NaN NaN
1996-11-04 113.880 NaN NaN
1996-11-05 114.250 NaN NaN
1996-11-06 113.950 NaN NaN
... ... ... ...
2020-11-03 104.725 1.1643 1.2924
2020-11-04 104.546 1.1762 1.3122
2020-11-05 104.438 1.1733 1.2967
2020-11-06 103.603 1.1818 1.3139
2020-11-09 104.871 1.1910 1.3193
[6243 rows x 3 columns]
The following is how to obtain the world's major stock indexes. There are other indicators that can be taken other than those listed here. We encourage you to search for it on Yahoo Finance yourself.
indices = ["^N225", "^DJI", "^GSPC", "^IXIC", "^GDAXI", "^FTSE", "^FCHI", "^HSI", "^SSEC", "^BVSP", "^KOSPI"]
#Omitted below
Then, as an example, based on the program, we will explain how to verify the return when investing in stocks that are considered to be cheap and of high quality (PER is 10 times or less, ROE is 10% or more), which was mentioned in the problem raised in this article. I will explain. The programming ability of the author is less than that of students, so I think there are some unsightly descriptions. If you have any suggestions regarding coding, please comment.
First, prepare the TSE stock list in CSV format. You can get a list of listed stocks on the TSE from here, so please pick up the stocks you want to verify. In this example, we targeted the TOPIX 500 constituents, which are relatively large in TOPIX.
import datetime
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
data = pd.read_csv("topix500.csv")
print(data)
Execution screen
code
0 1332
1 1333
2 1414
3 1605
4 1721
.. ...
494 9962
495 9983
496 9984
497 9987
498 9989
[499 rows x 1 columns]
Set the yfinance ticker. The Nikkei Stock Average, which is market data, is added to the above stocks.
stocks = [str(s)+".T" for s in data.code]
stocks.append("^N225")
tickers = yf.Tickers(" ".join(stocks))
Next, get the historical data of the price series with yfinance, and summarize the closing price data in the data frame.
closes = [] #closing price
for i in range(len(tickers.tickers)):
closes.append(tickers.tickers[i].history(period="max").Close)
closes = pd.DataFrame(closes).T #DataFrame conversion
closes.columns = stocks #Column name setting
closes = closes.ffill() #Completion of missing data
print(closes)
Execution screen
1332.T 1333.T 1414.T 1605.T 1721.T 1801.T ... 9962.T 9983.T 9984.T 9987.T 9989.T ^N225
Date ...
1965-01-05 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 1257.72
1965-01-06 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 1263.99
1965-01-07 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 1274.27
1965-01-08 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 1286.43
1965-01-12 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 1288.54
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-11-04 417.0 2240.0 5200.0 526.0 2789.0 3325.0 ... 3135.0 74380.0 6535.0 3865.0 3945.0 23695.23
2020-11-05 417.0 2211.0 5220.0 506.0 2782.0 3335.0 ... 3200.0 74400.0 6870.0 3965.0 4020.0 24105.28
2020-11-06 421.0 2219.0 5270.0 507.0 2826.0 3385.0 ... 3245.0 75480.0 6722.0 3785.0 4155.0 24325.23
2020-11-09 423.0 2252.0 5360.0 500.0 3010.0 3440.0 ... 3360.0 78310.0 7083.0 3770.0 4185.0 24839.84
2020-11-10 437.0 2319.0 5410.0 541.0 3040.0 3535.0 ... 3455.0 77910.0 6860.0 3865.0 4145.0 25108.21
[13862 rows x 500 columns]
Next, the financial statement data is summarized in a data frame. The first is net income for calculating PER and ROE. It seems that the NAN value is high because the fiscal year end of each brand is not aligned, but please be assured that the data is properly included in the necessary parts.
earnings = [] #Net income
dummy = tickers.tickers[0].financials.T["Net Income"]
dummy[:] = np.nan
for i in range(len(tickers.tickers)):
try:
earnings.append(tickers.tickers[i].financials.T["Net Income"])
except:
earnings.append(dummy) #Insert a dummy when an error occurs
earnings = pd.DataFrame(earnings).T #DataFrame conversion
earnings.columns = stocks #Column name setting
print(earnings)
Execution screen
1332.T 1333.T 1414.T 1605.T 1721.T 1801.T ... 9962.T 9983.T 9984.T 9987.T 9989.T ^N225
...
2006-08-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2007-08-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2009-03-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2010-03-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2011-03-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-20 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2020-05-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2020-06-30 NaN NaN 9.005000e+09 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2020-08-31 NaN NaN NaN NaN NaN NaN ... NaN 9.035700e+10 NaN NaN NaN NaN
2020-09-30 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
[69 rows x 500 columns]
Next is the equity capital for calculating ROE.
equity = [] #net worth
dummy = tickers.tickers[0].balance_sheet.T["Total Stockholder Equity"]
dummy[:] = np.nan
for i in range(len(tickers.tickers)):
try:
equity.append(tickers.tickers[i].balance_sheet.T["Total Stockholder Equity"])
except:
equity.append(dummy) #Insert a dummy when an error occurs
equity = pd.DataFrame(equity).T #DataFrame conversion
equity.columns = stocks #Column name setting
print(equity)
Execution screen
1332.T 1333.T 1414.T 1605.T 1721.T 1801.T ... 9962.T 9983.T 9984.T 9987.T 9989.T ^N225
...
2006-08-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2007-08-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2009-03-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2010-03-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2011-03-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-20 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2020-05-31 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2020-06-30 NaN NaN 8.359900e+10 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2020-08-31 NaN NaN NaN NaN NaN NaN ... NaN 9.565620e+11 NaN NaN NaN NaN
2020-09-30 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
[69 rows x 500 columns]
EPS (earnings per share) is required to calculate PER. Create a share issue data frame to calculate EPS.
shares = [] #Number of issued shares
for i in range(len(tickers.tickers)):
try:
shares.append(tickers.tickers[i].info["sharesOutstanding"])
except:
shares.append(np.nan) #Enter the NAN value when an error occurs
shares = pd.Series(shares) #Series
shares.index = stocks #Index name setting
print(shares)
Execution screen
1332.T 3.111410e+08
1333.T 5.262460e+07
1414.T 5.382810e+07
1605.T 1.460200e+09
1721.T 1.260270e+08
...
9983.T 1.020820e+08
9984.T NaN
9987.T 8.917480e+07
9989.T 1.169000e+08
^N225 NaN
Length: 500, dtype: float64
Create EPS and ROE data frames from data on net income, equity capital, and number of shares issued.
eps = earnings/shares.values # EPS
roe = earnings/equity # ROE
eps = eps.ffill() #Completion of missing data
roe = roe.ffill()
eps = eps.drop(["^N225"], axis=1) # ^Delete the N225 column
roe = roe.drop(["^N225"], axis=1)
print(eps)
print(roe)
Execution screen
1332.T 1333.T 1414.T 1605.T ... 9983.T 9984.T 9987.T 9989.T
...
2006-08-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
2007-08-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
2009-03-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
2010-03-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
2011-03-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ...
2020-05-20 47.464013 238.23459 150.107472 107.557873 ... 1592.621618 NaN 316.378618 202.668948
2020-05-31 47.464013 238.23459 150.107472 107.557873 ... 1592.621618 NaN 316.378618 202.668948
2020-06-30 47.464013 238.23459 167.291805 107.557873 ... 1592.621618 NaN 316.378618 202.668948
2020-08-31 47.464013 238.23459 167.291805 107.557873 ... 885.141357 NaN 316.378618 202.668948
2020-09-30 47.464013 238.23459 167.291805 107.557873 ... 885.141357 NaN 316.378618 202.668948
[69 rows x 499 columns]
1332.T 1333.T 1414.T 1605.T 1721.T ... 9962.T 9983.T 9984.T 9987.T 9989.T
...
2006-08-31 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
2007-08-31 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
2009-03-31 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
2010-03-31 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
2011-03-31 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ...
2020-05-20 0.096428 0.094528 0.103502 0.05165 0.08434 ... 0.078191 0.173209 NaN 0.068505 0.126817
2020-05-31 0.096428 0.094528 0.103502 0.05165 0.08434 ... 0.078191 0.173209 NaN 0.068505 0.126817
2020-06-30 0.096428 0.094528 0.107717 0.05165 0.08434 ... 0.078191 0.173209 NaN 0.068505 0.126817
2020-08-31 0.096428 0.094528 0.107717 0.05165 0.08434 ... 0.078191 0.094460 NaN 0.068505 0.126817
2020-09-30 0.096428 0.094528 0.107717 0.05165 0.08434 ... 0.078191 0.094460 NaN 0.068505 0.126817
[69 rows x 499 columns]
From here, we will format the data by pushing it. First, format the data for monthly data, and then create a monthly return data frame (minus market returns).
closes["month"] = closes.index.month #Creating a month column
closes["end_of_month"] = closes.month.diff().shift(-1) #Creating a month-end flag column
closes = closes[closes.end_of_month != 0] #Extracted only at the end of the month
monthly_rt = closes.pct_change().shift(-1) #Creating monthly returns(With lag)
monthly_rt = monthly_rt.sub(monthly_rt["^N225"], axis=0) #Market return deduction
closes = closes[closes.index > datetime.datetime(2017, 4, 1)] #After April 2017
monthly_rt = monthly_rt[monthly_rt.index > datetime.datetime(2017, 4, 1)]
closes = closes.drop(["^N225", "month", "end_of_month"], axis=1) #Delete unnecessary columns
monthly_rt = monthly_rt.drop(["^N225", "month", "end_of_month"], axis=1)
print(closes)
print(monthly_rt)
Execution screen
1332.T 1333.T 1414.T 1605.T 1721.T 1801.T ... 9861.T 9962.T 9983.T 9984.T 9987.T 9989.T
Date ...
2017-04-28 511.60 3064.04 2390.82 994.50 1964.87 3873.35 ... 1758.93 2063.02 35232.05 4138.08 3573.51 3686.69
2017-05-31 550.67 3049.61 2498.64 947.96 2172.28 4310.81 ... 1730.92 2443.18 35949.09 4413.07 3529.87 4063.84
2017-06-30 625.93 2855.29 2687.68 1006.13 2141.72 4675.36 ... 1810.12 2507.68 36259.17 4459.15 3617.15 3950.69
2017-07-31 613.54 2895.69 2763.53 998.68 2094.50 4812.06 ... 1801.43 2673.82 32092.56 4391.01 3573.51 3875.26
2017-08-31 589.73 3068.85 2886.77 978.21 2188.95 5026.24 ... 1818.65 2756.89 30664.60 4373.37 3883.83 4294.85
2020-07-31 435.19 2021.00 4530.00 599.10 3058.94 3557.00 ... 1791.22 2489.70 55837.62 6595.00 3713.65 3580.46
2020-08-31 471.87 2398.00 5010.00 673.80 2922.77 3601.22 ... 2098.00 2777.20 63280.00 6598.00 3907.01 3912.72
2020-09-30 447.00 2412.00 5220.00 563.50 2921.00 3550.00 ... 1970.00 2935.00 65860.00 6469.00 4005.00 3965.00
2020-10-30 401.00 2182.00 5020.00 492.00 2646.00 3245.00 ... 1915.00 3090.00 72710.00 6793.00 3765.00 3875.00
2020-11-10 437.00 2319.00 5410.00 541.00 3040.00 3535.00 ... 1998.00 3455.00 77910.00 6860.00 3865.00 4145.00
[44 rows x 499 columns]
1332.T 1333.T 1414.T 1605.T 1721.T ... 9962.T 9983.T 9984.T 9987.T 9989.T
Date ...
2017-04-28 0.052727 -0.028350 0.021457 -0.070438 0.081918 ... 0.160633 -0.003289 0.042813 -0.035853 0.078659
2017-05-31 0.117186 -0.083203 0.056174 0.041880 -0.033552 ... 0.006917 -0.010858 -0.009042 0.005243 -0.047327
2017-06-30 -0.014391 0.019553 0.033625 -0.002001 -0.016644 ... 0.071656 -0.109508 -0.009877 -0.006661 -0.013689
2017-07-31 -0.024808 0.073799 0.058595 -0.006498 0.059094 ... 0.045067 -0.030496 0.009982 0.100838 0.122273
2017-08-31 -0.013368 0.001479 0.016405 0.109973 0.112144 ... 0.018384 0.018514 -0.015500 -0.030392 -0.007134
2020-07-31 0.018428 0.120684 0.040103 0.058830 -0.110373 ... 0.049619 0.067429 -0.065402 -0.013790 0.026941
2020-08-31 -0.054665 0.003878 0.039956 -0.165659 -0.002566 ... 0.054860 0.038811 -0.021512 0.023120 0.011401
2020-09-30 -0.093937 -0.086386 -0.029343 -0.117915 -0.085175 ... 0.061782 0.112979 0.059056 -0.050954 -0.013728
2020-10-30 0.013155 -0.025898 -0.022349 0.013459 0.066692 ... 0.021076 -0.026613 -0.090432 -0.056213 -0.031199
2020-11-10 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
[44 rows x 499 columns]
Finally, create a PER and ROE data frame so that the monthly returns are in the same dimension.
eps_df = pd.DataFrame(index=monthly_rt.index, columns=monthly_rt.columns) #DF creation of the same dimension as monthly return
roe_df = pd.DataFrame(index=monthly_rt.index, columns=monthly_rt.columns)
for i in range(len(eps_df)): #Substitution to each line
eps_df.iloc[i] = eps[eps.index < eps_df.index[i]].iloc[-1]
for i in range(len(roe_df)):
roe_df.iloc[i] = roe[roe.index < roe_df.index[i]].iloc[-1]
per_df = closes/eps_df #Creating a PER data frame
print(per_df)
print(roe_df)
Execution screen
1332.T 1333.T 1414.T 1605.T 1721.T 1801.T ... 9861.T 9962.T 9983.T 9984.T 9987.T 9989.T
Date ...
2017-04-28 11.1972 10.4392 NaN 31.454 17.0954 NaN ... 91.0625 31.8555 NaN NaN 14.9553 18.4872
2017-05-31 12.0523 10.39 NaN 29.982 18.9 NaN ... 89.6124 37.7256 NaN NaN 14.7726 20.3785
2017-06-30 13.6995 9.72799 NaN 31.8218 18.6341 NaN ... 93.7127 38.7215 NaN NaN 15.1379 19.8111
2017-07-31 13.4284 9.86563 21.2599 31.5862 18.2232 NaN ... 93.2628 41.2869 NaN NaN 14.9553 19.4328
2017-08-31 12.9072 10.4556 22.208 30.9388 19.045 NaN ... 94.1543 42.5696 NaN NaN 16.254 21.5369
2020-07-31 9.16884 8.48323 27.0784 5.57002 14.8307 NaN ... 162.317 42.8301 35.0602 NaN 11.738 17.6665
2020-08-31 9.94164 10.0657 29.9477 6.26453 14.1705 NaN ... 190.117 47.7759 39.7332 NaN 12.3492 19.306
2020-09-30 9.41766 10.1245 31.203 5.23904 14.1619 NaN ... 178.518 50.4906 74.4062 NaN 12.6589 19.5639
2020-10-30 8.44851 9.15904 30.0074 4.57428 12.8286 NaN ... 173.534 53.157 82.1451 NaN 11.9003 19.1199
2020-11-10 9.20698 9.7341 32.3387 5.02985 14.7389 NaN ... 181.056 59.4361 88.0198 NaN 12.2164 20.4521
[44 rows x 499 columns]
1332.T 1333.T 1414.T 1605.T 1721.T ... 9962.T 9983.T 9984.T 9987.T 9989.T
Date ...
2017-04-28 0.117515 0.153443 NaN 0.0156865 0.071603 ... 0.11847 NaN NaN 0.0538158 0.170991
2017-05-31 0.117515 0.153443 NaN 0.0156865 0.071603 ... 0.11847 NaN NaN 0.0538158 0.170991
2017-06-30 0.117515 0.153443 NaN 0.0156865 0.071603 ... 0.11847 NaN NaN 0.0538158 0.170991
2017-07-31 0.117515 0.153443 0.101048 0.0156865 0.071603 ... 0.11847 NaN NaN 0.0538158 0.170991
2017-08-31 0.117515 0.153443 0.101048 0.0156865 0.071603 ... 0.11847 NaN NaN 0.0538158 0.170991
2020-07-31 0.0964277 0.0945276 0.107717 0.05165 0.0843397 ... 0.0781906 0.173209 NaN 0.0685051 0.126817
2020-08-31 0.0964277 0.0945276 0.107717 0.05165 0.0843397 ... 0.0781906 0.173209 NaN 0.0685051 0.126817
2020-09-30 0.0964277 0.0945276 0.107717 0.05165 0.0843397 ... 0.0781906 0.0944602 NaN 0.0685051 0.126817
2020-10-30 0.0964277 0.0945276 0.107717 0.05165 0.0843397 ... 0.0781906 0.0944602 NaN 0.0685051 0.126817
2020-11-10 0.0964277 0.0945276 0.107717 0.05165 0.0843397 ... 0.0781906 0.0944602 NaN 0.0685051 0.126817
[44 rows x 499 columns]
Finally, let's combine these data frames into one.
stack_monthly_rt = monthly_rt.stack() #Stack in one dimension
stack_per_df = per_df.stack()
stack_roe_df = roe_df.stack()
df = pd.concat([stack_monthly_rt, stack_per_df, stack_roe_df], axis=1) #Join
df.columns = ["rt", "per", "roe"] #Column name setting
df["rt"][df.rt > 1.0] = np.nan #Removal of outliers
print(df)
Execution screen
rt per roe
Date
2017-04-28 1332.T -0.047638 11.1972 0.117515
1333.T -0.070101 10.4392 0.153443
1414.T 0.026680 NaN NaN
1605.T -0.038959 31.454 0.0156865
1721.T 0.051664 17.0954 0.071603
... ... ... ...
2020-11-10 9962.T 0.025375 59.4361 0.0781906
9983.T -0.021231 88.0198 0.0944602
9984.T -0.082885 NaN NaN
9987.T -0.066187 12.2164 0.0685051
9989.T -0.023070 20.4521 0.126817
[21892 rows x 3 columns]
Now, let's extract the stocks (PER <10, ROE> 0.1) that are considered to be cheap and of high quality, and observe the distribution map of returns and the cumulative returns when buying and selling these.
By analyzing the data in this way, it is clear that individual stock selection using tools such as those introduced in magazines is completely unreliable. This verification is just a deduction of market returns, so you may be getting some returns when the market is booming.
value_df = df[(df.per < 10) & (df.roe > 0.1)] #Extract cheap and high quality brands
plt.hist(value_df["rt"]) #Histogram drawing
plt.show()
balance = value_df.groupby(level=0).mean().cumsum() #Create cumulative returns
plt.clf()
plt.plot(balance["rt"]) #Drawing the balance curve
plt.show()
Execution result
The verification presented in this article is still just the beginning of quantitative and empirical trading. Although this article is labeled "Successful Investment", few people actually read this article and succeed in investing. That's because the vast majority of people who read the article don't take action. The action here is not just to read this article and trade the burning blade, but to verify the quantitative and empirical trading by ingenuity based on the inspiration received from this article. It means to reach.
While this article is as straightforward as possible, readers can never understand the details and useful insights that have arisen in the process by simply reading the article. The only way to make this your own is to move your hands.
I have written my knowledge about trading on blogs and notes. If you've read this article and are serious about quantitative and empirical trading, the following references are sure to guide you.
Again, most trading is determined by luck. The reason I've done well so far is because I'm lucky that the performance hasn't fallen. May you also have good luck.
-Statistical Modeling of Trading -History and Prospects of Trading Bias Countermeasure Technology -Recommendation for AI investment -Recommendation of Systre -Investment index search procedure -Machine learning stock price forecast Iroha no "i" -Machine learning stock price forecast Iroha no "ro"