[Python] I played with natural language processing ~ transformers ~

I heard that various tools for natural language processing can be tried with python, so I played around with them. I can't see the algorithm at all, but it's amazing that you can do something interesting in just a few lines.

Execution environment

Google ColabNotebook

What I tried

First, install transformers and define what you need.

pip install transformers

lang.py


import torch
from transformers import pipeline
sentiment_analysis = pipeline('sentiment-analysis')
question_answering = pipeline('question-answering')
fill_mask = pipeline("fill-mask")
feature_extraction = pipeline("feature-extraction")

This time I played with the above four. Let's look at each below.

sentiment-analysis It outputs the positive / negative degree of the input sentence.

lang.py


sentiment_analysis("Because of the pandemic, I decided to refrain from going out.")
# => [{'label': 'NEGATIVE', 'score': 0.9692758917808533}]

It is expected to be negative with a great probability.

question-answering If you give a question and a situational explanation (there is a word to answer), the answer will be returned.

lang.py


question_answering({
    'question': 'What is the cause of the pandemic?',
    'context' : 'The coronavirus triggered an outbreak, and society was thrown into chaos.'
})
# => {'answer': 'coronavirus', 'end': 15, 'score': 0.6689822122523879, 'start': 4}

You have the correct answer. (However, when I tried various other things, it sometimes returned an error, so it is easy for the algorithm to understand? It seems that it will not work unless it is a sentence.)

fill-mask If you give a sentence with in one place, it will return a word that seems to be applicable in the blank.

lang.py


fill_mask("I have to be in bed all day today because I get <mask>.")
'''
 => [{'score': 0.2714517414569855,
  'sequence': '<s> I have to be in bed all day today because I get tired.</s>',
  'token': 7428},
 {'score': 0.19346608221530914,
  'sequence': '<s> I have to be in bed all day today because I get sick.</s>',
  'token': 4736},
 {'score': 0.07417058944702148,
  'sequence': '<s> I have to be in bed all day today because I get headaches.</s>',
  'token': 20816},
 {'score': 0.05399525910615921,
  'sequence': '<s> I have to be in bed all day today because I get insomnia.</s>',
  'token': 37197},
 {'score': 0.05070624500513077,
  'sequence': '<s> I have to be in bed all day today because I get sleepy.</s>',
  'token': 33782}]
'''

Everything looks good. (I'm sorry for all the example sentences that seem to be depressing.)

feature-extraction It returns a vector that represents the characteristics of the entered sentence. Unlike the above three, it is a return value of only numerical values, but I thought that it would be easy to handle sentences with my own model if I used this. (I want to do something someday)

lang.py


array = feature_extraction("I catch a cold.")

import numpy as np
np.array(array).shape
# => (1, 7, 768)

array[0][0][:10]
'''
 => [0.3683673143386841,
 0.008590285666286945,
 0.04184938594698906,
 -0.08078824728727341,
 -0.20844608545303345,
 -0.03908906877040863,
 0.19680079817771912,
 -0.12569604814052582,
 0.010193285532295704,
 -1.1207540035247803]
'''

It returned a list type with the above dimensions and values. Even so, using so much data to understand the single sentence "I caught a cold". ..

One more thing below.

lang.py


array = feature_extraction("I catch a cold and I am sleepy.")

import numpy as np
np.array(array).shape
# => (1, 11, 768)

array[0][0][:10]
'''
 => [0.3068505525588989,
 0.026863660663366318,
 0.17733855545520782,
 0.03574731573462486,
 -0.12478257715702057,
 -0.22214828431606293,
 0.2502932548522949,
 -0.17025449872016907,
 -0.09574677795171738,
 -0.9091089963912964]
'''

The second dimension has changed. The last dimension, 768, doesn't seem to change.

Recommended Posts

[Python] I played with natural language processing ~ transformers ~
I tried natural language processing with transformers.
Python: Natural language processing
3. Natural language processing with Python 2-1. Co-occurrence network
3. Natural language processing with Python 1-1. Word N-gram
100 Language Processing with Python Knock 2015
3. Natural language processing with Python 4-1. Analysis for words with KWIC
Building an environment for natural language processing with Python
Study natural language processing with Kikagaku
100 Language Processing Knock with Python (Chapter 1)
[Natural language processing] Preprocessing with Japanese
100 Language Processing Knock with Python (Chapter 3)
I played with PyQt5 and Python3
3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER
100 Language Processing Knock with Python (Chapter 2, Part 2)
I tried a functional language with Python
100 Language Processing Knock with Python (Chapter 2, Part 1)
Dockerfile with the necessary libraries for natural language processing in python
Getting started with Python with 100 knocks on language processing
[Python] I introduced Word2Vec and played with it.
I played with wordcloud!
RNN_LSTM2 Natural language processing
Python: Deep Learning in Natural Language Processing: Basics
Image processing with Python
I will write a detailed explanation to death while solving 100 natural language processing knock 2020 with Python
Let's enjoy natural language processing with COTOHA API
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
I tried to extract named entities with the natural language processing library GiNZA
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
3. Natural language processing with Python 3-1. Important word extraction tool TF-IDF analysis [original definition]
3. Natural language processing with Python 3-3. A year of corona looking back at TF-IDF
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
Image processing with Python (Part 2)
I tried fp-growth with python
I tried scraping with Python
"Apple processing" with OpenCV3 + Python3
I made blackjack with python!
Acoustic signal processing with Python
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
Image processing with Python (Part 1)
Natural language processing 1 Morphological analysis
Natural language processing 3 Word continuity
Image processing with Python (Part 3)
I tried 100 language processing knock 2020
I tried gRPC with Python
Python: Natural language vector representation
I tried scraping with python
I made blackjack with Python.
I made wordcloud with Python.
Natural language processing 2 Word similarity
[Python] Image processing with scikit-image
I tried to classify Mr. Habu and Mr. Habu with natural language processing × naive Bayes classifier
Quick batch text formatting + preprocessing for Aozora Bunko data for natural language processing with Python
[Practice] Make a Watson app with Python! # 3 [Natural language classification]
The first artificial intelligence. I wanted to try natural language processing, so I will try morphological analysis using MeCab with python3.
[Python] Try to classify ramen shops by natural language processing
3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]