[PYTHON] Wikipedia goes from the era of writing to the era of creation ~ Automatic generation from Twitter

image.png Twi2Wiki is a web application developed by Kosuke Fujita, Toshio Tanaka, and Kojiro Yamamoto in 2020. It has a function to generate article page-like sentences of Wikipedia from the profile information of Twitter . It is compatible with Smartphone and PC. Support for Dial Pulse Black Phone has ended in 2020. image.png ** Click the image to go to the app **

image.png Twi2Wiki is a web application developed for the purpose of allowing individuals to easily create Wikipedia-like profiles. Text information is required to generate the article page (see the system for details), but with an emphasis on ease of use, the method using Twitter's profile text, which was already popular at the time, was adopted. image.png When Kana Nishino's "Torisetsu" became popular in 2015, there was an active movement to convey her detailed profile when building relationships with the other party (Note 1). On the other hand, detailed profiles have resulted in excessive creation costs, and there has been a growing demand for simplification (Torisetsu's digital transformation). Against this background of the times, it was developed in 2020 by a volunteer development team (Fujita, Tanaka, Yamamoto) as an app that anyone can easily create a Wikipedia-style profile. image.png The back end is roughly divided into three functions: Twitter cooperation, occupation judgment, and biography generation, and the generated sentences are displayed as a Wikipedia -like Web page. The application part uses Flask, which is a web application framework of Python, and is operated on a free server of Heroku.

I noticed that it was mid-December, and I couldn't secure enough development resources, so I don't have the function to share on Twitter. As an alternative, screenshot sharing is recommended. image.png The occupation is judged based on the account name, profile text, and the latest tweets of Twitter, and the occupation with the highest possibility is selected from the top 100 occupations that elementary school students want to become. As a judgment method, prepare 1,000 words with similar Distributed expression for one type of occupation, and select the occupation in which the most target words appear in the Twitter sentence. Assigned. image.png Biography is generated using seq2seq (English: sequence to sequence, abbreviation: seq2seq), which is a type of neural network. seq2seq is one of the natural language processing technologies consisting of two models, Encoder and Decoder, based on LSTM (English: Long Short-Term Memory, Japanese: Long Short-Term Memory).

For learning, 130,000 Wikipedia person pages are used. The introductory part of the article page is used as input data, and the bio is used as output data. The conversion from Twitter profile to Wikipedia-style biography uses the same algorithm as the translation from Japanese to English (Note 2).

If you describe your Twitter profile according to Wikipedia's Style Manual, it has the characteristic that a biography is easily generated (Note 3). The training data also includes Donald Trump, Crystal Noda, Minami Hamabe and others.

In creating it, the developer Kojiro Yamamoto wrote two articles, "Sentence generation using Seq2Seq" and "I implemented Attention Seq2Seq with PyTorch". I was referring to it. image.png ** World share ** In December 2020, the number of users exceeded 0 in about one month from the start of Service, and exceeded 3 in January of the following year. If the number of users increases at this rate, it will reach the world population (Note 4) in about 200 million years. This is because of the simplicity of linking with Twitter at the start of use and the user's lead that the result can be posted on Twitter, it is said that new things and things that are likely to be news items will be tweeted Twitter It is said that this is because it matches the user's habits.

** Delayed spread to young people ** There is a view that the spread to young people is delayed. According to the Internet survey conducted independently by the development team, the usage rate of 0 to 3 years old is 0%, which is far below the number of users of 20 to 120 years old, which is the main target group. It was. This is attributed to the delay in responding to Babbling such as "da" and "au" used by young people. image.png Some users have pointed out that the background is significantly different from the actual background. The development team acknowledged this and attributed it to the user's previous life (Note 5) background. The above phenomenon is "Your name. ”(Director: Makoto Shinkai) has been reported to be seen by many users. image.png

  1. ^ There are various theories about the relationship with "Torisetsu".
  2. seq2seq model using ^ Attention, GRU is used for LSTM.
  3. ^ Twitter profile text is considered an introduction to Wikipedia.
  4. ^ 7.7 billion as of 2020.
  5. ^ There is also a theory that it is a parallel world. image.png -Flask Document -Deploy Flask app on Heroku -[Python] Create a Twitter-linked app with Flask + Tweepy

Recommended Posts

Wikipedia goes from the era of writing to the era of creation ~ Automatic generation from Twitter
Change the decimal point of logging from, to.
From the introduction of pyethapp to the execution of contract
The story of moving from Pipenv to Poetry
The wall of changing the Django service from Python 2.7 to Python 3
Try to estimate the number of likes on Twitter
SIGNATE Quest ② From creation of targeting model to creation of submitted data
How to calculate the amount of calculation learned from ABC134-D
Try to improve the accuracy of Twitter like number estimation
How to do the initial setup from Django project creation
[Python] I tried to visualize the follow relationship of Twitter
The story of copying data from S3 to Google's TeamDrive
Create a correlation diagram from the conversation history of twitter
After all, the story of returning from Linux to Windows
Program to determine leap year from the Christian era [Python]
How to get a list of links from a page from wikipedia
[Python] Try to graph from the image of Ring Fit [OCR]
[Python] LINE notification of the latest information using Twitter automatic search
Read all the contents of proc / [pid] ~ From setgroups to wchan ~
Read all the contents of proc / [pid] ~ From cwd to loginuid ~
[Linux] How to disable the automatic update of the /etc/resolv.conf file (AmazonLinux2)
Read all the contents of proc / [pid] ~ From map_files to numa_maps ~
Read all the contents of proc / [pid] ~ From oom_adj to sessionid ~
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
From "drawing" to "writing" the configuration diagram: Try drawing the AWS configuration diagram with Diagrams
DataFrame of pandas From creating a DataFrame from two lists to writing a file
Read all the contents of proc / [pid] ~ from attr to cpuset ~
Example of batch commit creation (some methods to process while ticking the array) and sample of batch writing to Firestore