[PYTHON] [Failure] I wanted to generate sentences using Flair's TextRegressor

Caution! It's just a failure story, so if you want to generate a document, I recommend turning around!

I tried to generate sentences using Flair, which is a super convenient NLP library, but I couldn't do that in the first place. From the name of TextRegressor, it seems that it will somehow regress and generate sentences ...? Flair is insanely convenient, but there aren't many articles in Japanese, so I'll write an article even if it's not a big deal.

Deliverables

https://github.com/ochiba0227/flair_text_regressor

Creating training data

It doesn't appear in the Flair tutorial either, so TextRegressor test code and [TextRegressor model implementation](https: / I read from /github.com/flairNLP/flair/blob/master/flair/models/text_regression_model.py) and prepared a sentence labeled as follows. It seems that only float type can be used for the label, so I put the same number for the time being. Perhaps you can learn the articles generated by label, but this time we will learn everything with a mess.

Labeled text
This is borrowed from the article title on [TechCrunch](https://jp.techcrunch.com/).
label_topic text
1 To solve the problem that an error message is displayed every time Windows starts.
If it is a 1-key case, the German-made "Wunderkey" is multifunctional and recommended for outdoor activities!
1 Just put it in the washing machine. The power of natural "iodine" can prevent mold for 3 months, and the laundry is also sterilized and deodorized!
1 Three things to prepare, easy to cook. Pre-made for a week+Single-use recipe summary
1 What are the tips for using Slack during remote work? Practical technique to ask Slack Japan representative Sasaki
1 [Today's sale information] At Amazon Time Sale, 1,Multi-functional smartwatch in the 000 yen range and 2,Folding desk light that becomes a mobile battery in the 000 yen range is a bargain
1 Think of efficiency rather than tedious effort. "Not alone" self-study method of active University of Tokyo students
1 Combined bag perfect for business trips and easy excursions [Today's life hack tool]
1 Various illustrations and layouts can be selected, and English characters are also possible. Name sticker that makes you want to press
1 This new coronavirus epidemic does not end immediately. There is no choice but to accept the reality and live
1 Eliminate lack of exercise! Recommended home fitness videos / games / apps
1 I noticed in remote work "Only PC is Wi-Fi-Fi slow "problem. What is the reason and countermeasures?
1 Telework recommended goods! I tried using "beblau" which can carry a notebook PC and peripherals together
1 Easy and delicious anchovy butter can be used in any dish
1 [For readers only 10%Off] If something moves, a warning sound and a smartphone will be notified. "TracMo Leaf" that can be tracked to prevent theft and things left behind
1 What is it like to work with your family? What is the challenge? | Everyone's remote work
1 Precautions when making disinfectant at home | It is dangerous if the materials are put in the wrong order
1 [Today's sale information] At Amazon Time Sale, you can get a bargain on mouthwash with 6 kinds of effects such as whitening and sterilization in the 900 yen range and gel cushion for back pain with good breathability.
1 Balance and calm even in difficult and difficult situations. Two words "cannot be translated"
1 For those who do not want to twist their chest pocket, a case where several pens can be taken out immediately [Today's life hack tool]

Learning

Divide the data created above into train.tsv, dev.tsv, test.tsv. Place the file in resources / data and run my_text_regressor.py.

Sentence generation (could not)

This time, we will generate a document that starts with `when Windows starts. The results are as follows.

# create example sentence
sentence = Sentence('When Windows starts', use_tokenizer=japanese_tokenizer)
print(sentence.to_tokenized_string())
# predict tags and print
regressor.predict(sentence)
sentence.to_dict()

{'entities': [],
 'labels': [{'confidence': 1.0, 'value': '0.8864221572875977'}],
 'text': 'When Windows starts'}

Oh, the text was just vectorized ...? What a mess.

Impressions

Since there is no tutorial, it was a little troublesome to read the source code ... I get an error message during learning, so it's probably still a feature under development. To be honest, it took about an hour, but I felt sad ... It's not so good to do it with momentum. It's regrettable that I can't do it, so next time I'd like to use GPT-2 quietly and try to generate sentences.

It seems that some people have already learned the Japanese Wikipedia corpus. https://qiita.com/tanreinama/items/3b73fdeff09dfe74ef52

Recommended Posts

[Failure] I wanted to generate sentences using Flair's TextRegressor
[Introduction to Pytorch] I want to generate sentences in news articles
Hash chain I wanted to avoid (2)
I wanted to challenge the classification of CIFAR-10 using Chainer's trainer
I wanted to evolve cGAN to ACGAN
Hash chain I wanted to avoid (1)
I wanted to generate a sentence like "Fucking Deca Rashomon" (past form)
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried using Azure Speech to Text.
I wanted to solve ABC160 with Python
I wanted to solve ABC159 in Python
I wanted to solve ABC172 with Python
I really wanted to copy with selenium
Implemented DQN in TensorFlow (I wanted to ...)
I tried to predict Covid-19 using Darts
I wanted to solve NOMURA Contest 2020 with Python
i-Town Page Scraping: I Wanted To Replace Wise-kun
I want to email from Gmail using Python.
I tried to synthesize WAV files using Pydub.
I want to visualize csv files using Vega-Lite!
I wanted to play with the Bezier curve
I wanted to install Python 3.4.3 with Homebrew + pyenv
I tried to generate a random character string
[I want to classify images using Tensorflow] (2) Let's classify images
I tried to make a ○ ✕ game using TensorFlow
I just wanted to understand Python's Pickle module