Probability can be calculated by counting and dividing. However, when it comes to the probability of precipitation. The same day will never come again, and the probability of precipitation varies from day to day. How reliable is the probability of precipitation? Isn't it just giving numbers appropriately? Even if the probability of precipitation is 90%, there are days when it does not rain. It's not 100%, so it's not strange. However, "It's not 100%, so it's not a mistake if it's not rain" doesn't mean anything. Was it really a 90% chance of rain on a sunny day with a 90% chance of rain?
If the probability of precipitation is the same every day, the probability of precipitation can be verified by recording whether it was sunny or rainy every day and counting the days when it was raining. It is obtained by a binomial test or a chi-square test.
Furthermore, I would like to assume that there is no rare probability of precipitation, or if there is, there is a sufficient number of samples even if it is taken into consideration. Anyway, if the number of samples is finite and the number of samples is sufficient, it is possible to create data that "the probability of rain on the day when this probability of precipitation is predicted will be what percentage." And the probability of precipitation announced by the Japan Meteorological Agency is actually decided in 10% increments from 0% to 100%. This method seems to work. (Even if that is not the case, there is a way to forcibly divide the data in 10% increments and do this)
Quoted from Japan Meteorological Agency Forecast Terminology Page.
a) The average value of the probability (%) of rainfall of 1 mm or more in the forecast area within a certain period of time, expressed as 0, 10, 20, ..., 100% (rounded off during this period). To do). b) Precipitation probability of 30% means that when the forecast of 30% is announced 100 times, about 30 of them have precipitation of 1 mm or more, and it does not predict the amount of precipitation. Precipitation probability is the probability that 1 mm or more of rain or snow (converted to the amount of precipitation when it melts) will fall within a certain period of time within the forecast zone, and it is announced in 10% increments from 0% to 100%. To.
You should also pay attention to this.
The target time is 6 hours for the short-term forecast and 24 hours for the weekly forecast.
The definition of the short-term precipitation probability in the weather column today and the daily precipitation probability in the weekly forecast column were different. The former has a probability of falling 1 mm or more in 6 hours, and a probability of falling 1 mm or more in 24 hours. I did not know.
You can browse past weather data at http://www.data.jma.go.jp/obd/stats/etrn/ You can download it at http://www.data.jma.go.jp/gmd/risk/obsdl/index.php. I'm grateful that you've made this public for free.
I was able to get the amount of precipitation per day. Unfortunately, the forecasted precipitation probability does not seem to be disclosed.
On this web page, the past weather forecast for Tokyo (for individuals?) Is published. http://homepage3.nifty.com/i_sawaki/WeatherForecast/ Amazing ...
For the time being, I collected data from January 1, 2014 to December 31, 2014 from the above site. For the probability of precipitation, the value two days ago, which is the newest of the probability of precipitation in one day, was used.
The validity of each precipitation probability can be found by the chi-square test, but I would like to ask what is the overall situation. Therefore, this time, it is called "coefficient of determination". I tried using something.
According to Wikipedia, there are various theories in the definition, but it is general.
This time is different from the regression curve, ($ x_i $, $ y_i $) = (Forecast Precipitation Probability, Number of Rainy Days at the time of the forecast / Number of days when the forecast was issued) And the estimated value was $ f_i = x_i $.
The code written in Python (which should work in both 2 and 3) is posted in around here.
Shown here is the distribution of precipitation probabilities in 2014. Here, the horizontal axis is the forecast of precipitation probability, and the vertical axis is the number of rainy days at the time of the forecast / the number of days when the forecast was issued. The value of the coefficient of determination at this time was 0.8714.
If you search by "coefficient of determination" etc., there seems to be no standard that it should be more than 0.5, but it seems that 0.5 or more is a sufficiently good regression. I think 0.8714 is a pretty good value.
It turns out that the probability of precipitation is not a mess.
Recommended Posts