[PYTHON] Is the probability of precipitation correct?

Is the probability of precipitation correct?

Probability can be calculated by counting and dividing. However, when it comes to the probability of precipitation. The same day will never come again, and the probability of precipitation varies from day to day. How reliable is the probability of precipitation? Isn't it just giving numbers appropriately? Even if the probability of precipitation is 90%, there are days when it does not rain. It's not 100%, so it's not strange. However, "It's not 100%, so it's not a mistake if it's not rain" doesn't mean anything. Was it really a 90% chance of rain on a sunny day with a 90% chance of rain?

For example, if the probability of precipitation is 30% daily, the validity can be verified.

If the probability of precipitation is the same every day, the probability of precipitation can be verified by recording whether it was sunny or rainy every day and counting the days when it was raining. It is obtained by a binomial test or a chi-square test.

If the number that appears as the probability of precipitation is finite and the number is smaller than the number of samples, it may be obtained.

Furthermore, I would like to assume that there is no rare probability of precipitation, or if there is, there is a sufficient number of samples even if it is taken into consideration. Anyway, if the number of samples is finite and the number of samples is sufficient, it is possible to create data that "the probability of rain on the day when this probability of precipitation is predicted will be what percentage." And the probability of precipitation announced by the Japan Meteorological Agency is actually decided in 10% increments from 0% to 100%. This method seems to work. (Even if that is not the case, there is a way to forcibly divide the data in 10% increments and do this)

Definition of precipitation probability

Quoted from Japan Meteorological Agency Forecast Terminology Page.

a) The average value of the probability (%) of rainfall of 1 mm or more in the forecast area within a certain period of time, expressed as 0, 10, 20, ..., 100% (rounded off during this period). To do). b) Precipitation probability of 30% means that when the forecast of 30% is announced 100 times, about 30 of them have precipitation of 1 mm or more, and it does not predict the amount of precipitation. Precipitation probability is the probability that 1 mm or more of rain or snow (converted to the amount of precipitation when it melts) will fall within a certain period of time within the forecast zone, and it is announced in 10% increments from 0% to 100%. To.

You should also pay attention to this.

The target time is 6 hours for the short-term forecast and 24 hours for the weekly forecast.

The definition of the short-term precipitation probability in the weather column today and the daily precipitation probability in the weekly forecast column were different. The former has a probability of falling 1 mm or more in 6 hours, and a probability of falling 1 mm or more in 24 hours. I did not know.

Japan Meteorological Agency historical data

You can browse past weather data at http://www.data.jma.go.jp/obd/stats/etrn/ You can download it at http://www.data.jma.go.jp/gmd/risk/obsdl/index.php. I'm grateful that you've made this public for free.

I was able to get the amount of precipitation per day. Unfortunately, the forecasted precipitation probability does not seem to be disclosed.

Historical data on the probability of precipitation

On this web page, the past weather forecast for Tokyo (for individuals?) Is published. http://homepage3.nifty.com/i_sawaki/WeatherForecast/ Amazing ...

The data is complete. let's try it

For the time being, I collected data from January 1, 2014 to December 31, 2014 from the above site. For the probability of precipitation, the value two days ago, which is the newest of the probability of precipitation in one day, was used.

How to quantitatively express the hit condition?

The validity of each precipitation probability can be found by the chi-square test, but I would like to ask what is the overall situation. Therefore, this time, it is called "coefficient of determination". I tried using something. According to Wikipedia, there are various theories in the definition, but it is general. R^2 = 1-\frac{\Sigma_i(y_i-f_i)^2}{\Sigma_i(y_i-\overline{y})^2} It was used. The coefficient of determination is used to see how well the regression curve fits, and the larger the number, the better the fit, with a maximum of 1. It can also be a negative value in this definition.

This time is different from the regression curve, ($ x_i $, $ y_i $) = (Forecast Precipitation Probability, Number of Rainy Days at the time of the forecast / Number of days when the forecast was issued) And the estimated value was $ f_i = x_i $.

The code written in Python (which should work in both 2 and 3) is posted in around here.

What is the probability of precipitation in 2014?

Shown here is the distribution of precipitation probabilities in 2014. スナップショット8.jpeg Here, the horizontal axis is the forecast of precipitation probability, and the vertical axis is the number of rainy days at the time of the forecast / the number of days when the forecast was issued. スナップショット7.jpeg The value of the coefficient of determination at this time was 0.8714.

If you search by "coefficient of determination" etc., there seems to be no standard that it should be more than 0.5, but it seems that 0.5 or more is a sufficiently good regression. I think 0.8714 is a pretty good value.

It turns out that the probability of precipitation is not a mess.

Recommended Posts

Is the probability of precipitation correct?
Tweet the probability of precipitation as part of the function of the bot
Science "Is Saito the representative of Saito?"
What is the cause of the following error?
[python] [meta] Is the type of python a type?
The update of conda is not finished.
The backslash of the Japanese keyboard is "ro"
Defeat the probability density function of the normal distribution
The answer of "1/2" is different between python2 and 3
The origin of Manjaro Linux is "Mount Kilimanjaro"
FAQ: Why is the comparison of numbers inconsistent?
The value of pyTorch torch.var () is not distributed
I tried to correct the keystone of the image
This is the only basic review of Python ~ 1 ~
This is the only basic review of Python ~ 2 ~
Calculate the probability of outliers on a boxplot
This is the only basic review of Python ~ 3 ~
ROS Lecture 119 Correct the color of the camera image
The beginning of cif2cell
[Note] Correct the orientation of the liquid crystal of M5Stick V
Around the place where the value of Errbot is stored
The meaning of self
the zen of Python
The story of sys.path.append ()
What is the true identity of Python's sort method "sort"? ??
Zip 4 Gbyte problem is a story of the past
What is a recommend engine? Summary of the types
When you think the update of ManjaroLinux is strange
Why is the first argument of [Python] Class self?
Check the asymptotic nature of the probability distribution in Python
Revenge of the Types: Revenge of types
The copy method of pandas.DataFrame is deep copy by default
What is the default TLS version of the python requests module?
The image display function of iTerm is convenient for image processing.
Initial setting of Mac ~ Python (pyenv) installation is the fastest
Numerical approximation method when the calculation of the derivative is troublesome
Is there a secret to the frequency of pi numbers?
Is the lottery profitable? ~ LOTO7 and the law of large numbers ~
Align the version of chromedriver_binary
[Python] Correct usage of map
Scraping the result of "Schedule-kun"
10. Counting the number of lines
The story of building Zabbix 4.4
Towards the retirement of Python2
[Apache] The story of prefork
What is the activation function?
What is the Linux kernel?
Compare the fonts of jupyter-themes
About the ease of Python
Get the number of digits
Explain the code of Tensorflow_in_ROS
Reuse the results of clustering
GoPiGo3 of the old man
Calculate the number of changes
Change the theme of Jupyter
The popularity of programming languages
Change the style of matplotlib
Visualize the orbit of Hayabusa2
About the components of Luigi
Connected components of the graph
Filter the output of tracemalloc