[PYTHON] Verification of the spread of hoaxes in the "State of Emergency Declaration on April 1"

Introduction

In the evening of March 30, 2020, rumors spread on the Internet that "the government will declare a state of emergency on April 1," and "lockdown = blockade the city on the following day."

The FIJ (Fact Check Initiative), which promotes fact check dissemination activities in Japan, takes up this matter as one of the unreliable information.

Government declares a state of emergency on April 1st and locks down on April 2nd-spreads on LINE etc.

At a press conference this evening, Secretary of State Suga clearly denied the rumor [^ 1].

March 30, 17:58: [Chief Cabinet Secretary “Urban blockade” information “No such fact” | NHK News](https://www3.nhk.or.jp/news/html/20200330/k10012357971000. html)

In addition, the news that Prime Minister Abe also denied this rumor was reported that night.

20:11 on March 30: Emergency Declaration "Asatte Declaration is a hoax" Prime Minister Abe | NHK News

I was curious about when this rumor spread so much that the Prime Minister and Chief Cabinet Secretary couldn't overlook it. As data, it is easy to verify the news site and the diffusion phenomenon, and it is easy for individuals to obtain [Twitter search API](https://developer.twitter.com/en/docs/tweets/search/ api-reference / get-search-tweets) is used.

As we'll see later, the source of this rumor is presumed to be LINE, so this validation using only Twiitter and other publicly available data can only capture part of its essence. In addition, please note that this verification probably includes my personal thoughts and delusions.

data set

I mainly used Twitter's search API as data to verify when this rumor spread and what could be the cause. As for this case, even if the rumors were true (assuming that a state of emergency is announced and the city is closed [^ 2]), it is expected that they will be bought up at nearby supermarkets, so only relatives and acquaintances It is expected that the desire to convey this rumor will work and that LINE will be actively used instead of Twitter, but LINE information is not available and is excluded from the dataset candidates.

In NHK news at 17:58 on March 30, 2020, Secretary of State Suga denied "city blockade", and in the news on the same night, Prime Minister Abe denied rumors about "state of emergency". We have collected tweets that mention these words around the 30th of March.

Specifically, for the period from 2020/03/27 00:00 to 2020/03/31 23:59, we made an inquiry to the Twitter API with the search conditions of "Lockdown OR City blockade OR Emergency declaration", and about 120 I got 10,000 tweets. The result is shown below [^ 3]. The horizontal axis represents time and the vertical axis represents the number of tweets. It also shows the estimation results of the local linear trend model for the same data and its 95% prediction / confidence interval.

See the Pen About the transition of the spread of the "state of emergency on April 1" ~ Changes in the number of tweets by Tajima Junpei ( @ p-baleine" ) on CodePen.

In the graph of the trend component with the periodic component removed in the estimation result, you can see the uptrend that cannot be seen on other days from the evening of March 28th and 30th.

Source of rumors

It is presumed that the above-mentioned upward trend is mainly caused by chain mail, which is carefully ventilated by FIJ and various news sites.

[Chain mails that "enter lockdown from April 1st" will be distributed around "LINE". Be careful because it is a hoax (Shuji Shinohara) --Individual --Yahoo! News](https://news.yahoo.co.jp/byline/shinoharashuji/20200330-00170566/?fbclid=IwAR2cnS6qOVv7UFc3x4TcUHVF_dTTiuLKiF_P_jJCu_KKH7lU54nSVC

As you can see in this article, the source of the chain mail is presumed to be LINE. Since you cannot access the LINE data, you cannot check the origin of the chain mail. Also, there is no way to verify that this chain mail was the cause of the word spread phenomenon (on Twitter) such as "state of emergency". After that, I will proceed with the assumption that chain mail was the cause of the diffusion phenomenon.

This time, on the premise that chain mail was the cause of the diffusion phenomenon, until the Prime Minister bothered to deny it in the media on the 30th instead of the 29th or 31st, the "state of emergency" and "" I investigated whether the word "lockdown" was spread.

Hypothesis about the cause of diffusion

The chain mail that caused this diffusion phenomenon is considered to be a type of fake news, as it is said to be a "hoax."

"[A Field Guide to" Fake News "and Other Information" created by Public Data Lab, which works on solving social issues using data when verifying fake news. Disorders](https://fakenews.publicdatalab.org/) ”[^ 4] was used as a reference. In addition to the traditional method of counting likes and retweets, "A Field Guide to Fake News" suggests a method of focusing on the situation in which the news spread and the actors involved in the spread. This is existing in that it focuses on the context behind the spread, such as "who" and "what" the fake news was spread for, "who" and "why" were involved in it. It is different from the method.

I don't know if this rumor, the "State of Emergency on April 1st," is the ideology of a particular individual or group, or the so-called fake news with interests, but focus on the context behind the spread. This is expected to result in more diverse considerations, so let's verify it according to this method.

In general, not only this time, the current spread is considered to be the cause of the spread at later times. The trend in the figure above also illustrates this well. However, there is no uptrend on the 29th and 31st, and although the uptrend can be confirmed, the uptrend is more pronounced on the 30th than on the 28th, so we can imagine the existence of additional causes.

This time, as the cause, I assumed a tweet with a large number of retweets around that time. It's based on the idea that tweets that are getting more retweets will have a greater positive / negative impact on their spread after their posting time. I chose this assumption because it's easy to get data first, and then it simplifies problem setup. There are a myriad of possible causes and combinations, such as news sites, Facebook, and virus spread, but we cannot verify all of them. Generally speaking, when it comes to modeling, you should use the simplest element that can most accurately express your hypothesis, but this time, whether you can get the element (data) in the first place, the resources of my computer, etc. , This setting is made due to restrictions that are out of the essence.

inspection result

We will examine whether tweets with a large number of retweets around the evening of March 30 caused the spread of tweets containing words such as "state of emergency" and "city blockade."

This time, I considered the model that is considered to be the simplest. That is, to the local linear trend model using the number of tweets mentioned above as the observed value, as an exogenous variable, the two values of whether or not the tweets that have acquired a certain number of retweets have already been tweeted at that time are added as regression terms. Saw. (This regression coefficient is estimated by maximum likelihood estimation due to the resource of the computer.) The threshold value of the number of retweets is heuristically examined and is set to 1,000 or more. The results are shown below.

See the Pen WNvBVXM by Tajima Junpei (@p-baleine) on CodePen.

The circles on the graph represent each tweet with 1,000 or more retweets, and the Y-axis on the right corresponds to the regression coefficient of the estimated model. The size of the circle corresponds to the number of retweets of each tweet. In addition, the titles of past news articles selected arbitrarily from NHK NEWS WEB are also shown at the top of the graph.

Below is the AIC [^ 5] of the local level model, the local linear trend model that incorporates periodicity (above), and the model that incorporates tweets with 1,000 or more retweets as exogenous variables. From the perspective of model selection, the model that incorporates tweets with 1,000 or more retweets as exogenous variables was the best model.

model AIC
local level model 9334.294
Local linear trend model (periodicity) 7640.617
Local linear trend model (periodicity + exogenous variables) 7491.854

You can see that there are more tweets with positive regression coefficients on the 28th and 30th when the uptrend was seen than in normal times. However, this result could not explain why there was an uptrend on the 28th and 30th, and why it was more pronounced on the 30th than on the 28th. Perhaps you can get some hints by comparing the contents of tweets with positive regression coefficients and tweets with negative regression coefficients, but this time we have done a thorough verification. not.

We also believe that this model is simple and therefore has the following problems:

--The regression coefficient of exogenous variables is calculated by maximum likelihood estimation. It is unlikely that the same effect will be exerted after the tweet with 1,000 or more retweets is posted, so it is more natural to treat it as a time-varying variable that changes over time. --Tweets with 1,000 or more retweets are treated as exogenous variables. In the first place, these tweets should be posted under the influence of the number of tweets, which is the objective variable, so it may be too violent to ignore them all.

Also, although we are using only Twitter data this time, I think we can consider a model that uses other sources (for example, a model that incorporates each article on NHK NEWS WEB as an exogenous variable). I will.

in conclusion

On March 30, 2020, I searched for the reason why the rumor that "the government will declare a state of emergency on April 1" spread from tweets containing similar words.

What I really wanted to do and what I would like to see is to be able to grasp the effects of rumors and fake news via the Internet on everyone's psychology in a way that is convincing to me. is. From a macro perspective, "trends" are exactly that, but as you can see in "A Field Guide to" Fake News "and Other Information Disorders," the relationships, interests, and motives of people involved in the spread of fake news. I am interested in visualization including.

I don't think it's limited to fake news, but of course no one is happy with information that could be detrimental to family, friends, or yourself, and that's why the spread of such fake news slows down your calm judgment. I think it will be an opportunity. It's just a story of my observation range, but the supermarket in the neighborhood did not show a phenomenon like buying up even after the beginning of March. However, only in the evening of this day (3/30), the shelves of daily necessities and rice were scarce compared to usual, and I felt an air of impatience to buy up in the store.

Also, if you look at the text of the chain mail that you are calm now, you can see that it is a textbook-like chain mail that no one would normally be fooled by. At the end of this message, there is an instruction to forward the message to an acquaintance on purpose, which can be regarded as a typical example of chain mail ([Wikipedia: Chain mail](https: // ja.). wikipedia.org/wiki/%E3%83%81%E3%82%A7%E3%83%BC%E3%83%B3%E3%83%A1%E3%83%BC%E3%83%AB #% E3% 82% A4% E3% 83% B3% E3% 82% BF% E3% 83% BC% E3% 83% 8D% E3% 83% 83% E3% 83% 88% E4% B8% 8A% E3% 81% AE% E9% A1% 9E% E5% 9E% 8B)) [^ 6]. (If you have received this message, please check the text, there are quite a few other mysterious points.) Despite this, if rumors are spreading online and, for example, the supermarket in the neighborhood is less than usual, some people may be fooled (I was half fooled). )

Perhaps if you read this far, I'm a complete outsider in both time series analysis and fake news. If there is something wrong, I would be really happy if you could point it out. However, as an amateur, I am interested in what kind of behavior people as a group will take when people are indirectly connected via the Internet in a situation where they feel dangerous, so I also studied this time. I tried to work on the matter. When I actually tried it, I was able to verify only a few models because both information collection and inference required more computer resources than I expected. I would like to do resource-consuming tasks such as inference in the cloud because I will not be able to do other work (such as diving in the net, playing with Emacs, and occasionally working), but the preceding one is ...: sweat:

Also, if there is a theme that interests me, then I would like to try it if I have the energy and resources (the one with: moneybag: because I have time).

References

In conducting this verification, I referred to the following:

-Measurement time series analysis of economic and financial data -Basics of time series analysis and state space model-Theory and implementation learned by R and Stan-

[^ 1]: However, as stated on the FIJ site described, these articles are not guaranteed to be 100% correct, so in this article There is no guarantee that the "hoax" you are talking about is 100% "hoax". [^ 2]: As an aside, I feel that people living in Tokyo, including myself, tend to think of the state of emergency and the blockade of the city together. [^ 3]: Model estimation was done at statsmodels. The graph is created with Observable and embedded using CodePen. [^ 4]: The downloadable PDF was created using Illustrator and InDesign as a whole, so I couldn't copy and paste in my environment, so it was difficult to take notes. The source is published on Github, so you may want to use it when taking notes https://github.com/PublicDataLab/fake-news-field-guide

Recommended Posts

Verification of the spread of hoaxes in the "State of Emergency Declaration on April 1"
Difference in results depending on the argument of multiprocess.Process
A note on the default behavior of collate_fn in PyTorch
Survey on the use of machine learning in real services
Analyzing data on the number of corona patients in Japan
Count the number of characters in the text on the clipboard on mac
Investigate the top 10 stocks raised at the time of the first state of emergency
Find the rank of a matrix in the XOR world (rank of a matrix on F2)
Get the number of readers of a treatise on Mendeley in Python
Plot the spread of the new coronavirus
The story of participating in AtCoder
The story of the "hole" in the file
The meaning of ".object" in Django
Play the comment of Nico Nico Douga on the terminal in conjunction with the video
[Android] Display images on the web in the info Window of Google Map
You can see the transition of points in the J League on the graph!