What is summpy

It is an automatic summarization API of sentences published by Recruit Technologies. It summarizes the entered text with the specified number of lines.

Published GitHub https://github.com/recruit-tech/summpy

This time, I put various sentences into this API and I tried to verify what the result would be.

Verification environment

EC2(Amazon Linux release 2) python2.7

Installation

Install pip, summpy, mecab-python3

For mecab-python3, if you do not specify version 0.996.5, Since the error "no such file or directory: / usr / local / etc / mecabrc" is displayed, the version is specified.

$ sudo easy_install pip
$ sudo pip install summpy
$ sudo pip install mecab-python3==0.996.5

At the same time, set the networkx version to 1.11. If you do not do this, you will get an "error": "add_edge () takes exactly 3 arguments (4 given)" "error at runtime.

$ sudo pip install multiqc==1.2
$ sudo pip install networkx==1.11

Server execution

Starts on port 8080. Nohup is added to run it in the background.

nohup python -m summpy.server -h 127.0.0.1 -p 8080 &

Source code

`summpy_test.py`


#!/usr/bin/env python2
# coding:utf-8
import requests

limit = 3 #Here, specify the number of lines you want to summarize
text = 'Enter the text you want to summarize here.'

p = {'sent_limit':limit, 'text':text}

r = requests.get('http://localhost:8080/summarize', params=p)

print(r.text)

Execution result

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "Enter the text you want to summarize here."
  ]
}

Since the text is one line, the result is also one line. I would like to change the text here in various ways.

I tried with various sentences

From here, I will summarize various sentences according to each theme. The text used for the abstract uses the content of the following article.

This is the longest article I wrote in the past.

How do you interpret "Adler Psychology" from an engineer's perspective? https://qiita.com/keki/items/0542d9d121cf89d6154e

First of all, summarize without thinking about anything

First, let's summarize the following sentences. In addition, after deleting the line breaks, I put it in the summary API.

Original

At this age, people become more interested in feelings, feelings, and ways of thinking.

Meanwhile, I came across "Adler Psychology" in this title a few years ago.

Engineers sometimes concentrate on the specialized work of programming, and it is often said that they are not good at communicating with people and that they are not good at joining the circle of teams.

Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.

I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.

This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.

By the way, it is quite a long sentence.
I hope you will see it with the intention of reading a small book.

Source code

`summpy_test.py`


#!/usr/bin/env python2
# coding:utf-8
import requests

limit = 3 #Here, specify the number of lines you want to summarize
text = 'At this age, people become more interested in feelings, feelings, and ways of thinking. Meanwhile, I came across "Adler Psychology" in this title a few years ago. Engineers are programming
Sometimes it is said that I am not good at communicating with people and I am not good at joining the team because I concentrate on my specialized work. Also, I'm worried about relationships and suffering from depression....Jobs that are often>I think it's a seed. I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships. This time, I am an engineer in the field of such "Adler psychology", but from the engineer's perspective
I would like to write an article about what it means when interpreted in. By the way, it is quite a long sentence. I hope you will see it with the intention of reading a small book.'

p = {'sent_limit':limit, 'text':text}

r = requests.get('http://localhost:8080/summarize', params=p)

print(r.text)

Execution result

$ python summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, people become more interested in feelings, feelings, and ways of thinking.",
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
  ]
}

Hmm. The connection between the 1st and 2nd lines is difficult to understand, but it is summarized in 3 lines. Also, it seems that the original text is not processed, but the text is simply extracted and selected.

Try removing punctuation

In order to investigate what punctuation means in summpy, I dare to remove all punctuation.

The source code is almost the same as before (only the value of the variable text changes), so I will omit it.

Original

At this age, I'm interested in people's feelings and emotional thinking.

A few years ago, I met "Adler Psychology," which is also in this title.

Engineers are not good at communicating with people because they concentrate on the specialized work of programming, and I think it is sometimes said that they are not good at joining the circle of teams.

Also, I'm worried about relationships and suffering from depression....I think that there are many cases of

I personally feel that "Adler Psychology" is the idea itself for solving such problems of human relationships.

This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, who is actually an engineer in the field.

By the way, it will be quite a long sentence
I hope you will see it with the intention of reading a little book.

Execution result

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, I'm interested in people's feelings and emotional thinking. I met a few years ago in "Adler Psychology," which is also in this title. Engineers specialize in programming. I'm not good at communicating with people because I concentrate on my work. I think it's sometimes said that I'm not good at joining the circle of teams. Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this, and I feel that "Adler Psychology" is the idea itself to solve such problems of human relationships. I would like to write an article about what it would be like if I interpret it from an engineer's point of view. By the way, it will be quite a long sentence. I hope you will read a little book."
  ]
}

It has been summarized in one line. Apparently, they consider punctuation as a sentence break.

Adjust the number of summary lines

Then what if we increase the number of lines to summarize? I tried to summarize the above sentence in 100 lines.

Punctuation marks have been restored.

Execution result

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, people become more interested in feelings, feelings, and ways of thinking.",
    "Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
    "Engineers sometimes concentrate on the specialized work of programming, and it is often said that they are not good at communicating with people and that they are not good at joining the circle of teams.",
    "Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.",
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
    "By the way, it is quite a long sentence.",
    "I hope you will see it with the intention of reading a small book."
  ]
}

The original text is as it is. The comma (,) is not used as a sentence break, You can see that they are separated by kuten (.). Besides, it seems to be separated by dots (.), Question marks (?), And exclamation marks (!).

Now, let's gradually reduce the number of lines.

Execution result (7 lines)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, people become more interested in feelings, feelings, and ways of thinking.",
    "Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
    "Engineers sometimes concentrate on the specialized work of programming, and it is often said that they are not good at communicating with people and that they are not good at joining the circle of teams.",
    "Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.",
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
    "By the way, it is quite a long sentence."
  ]
}

Execution result (6 lines)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, people become more interested in feelings, feelings, and ways of thinking.",
    "Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
    "Also, I'm worried about relationships and suffering from depression....I think that there are many cases of this.",
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
    "By the way, it is quite a long sentence."
  ]
}

Execution result (5 lines)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, people become more interested in feelings, feelings, and ways of thinking.",
    "Meanwhile, I came across "Adler Psychology" in this title a few years ago.",
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
    "By the way, it is quite a long sentence."
  ]
}

Execution result (4 lines)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, people become more interested in feelings, feelings, and ways of thinking.",
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field.",
    "By the way, it is quite a long sentence."
  ]
}

Execution result (3 lines)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "At this age, people become more interested in feelings, feelings, and ways of thinking.",
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
  ]
}

Execution result (2 lines)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "I personally feel that "Adler Psychology" is the way of thinking and thought itself to solve such problems of human relationships.",
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
  ]
}

Execution result (1 line)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "This time, I would like to write an article about what it would be like to interpret such "Adler psychology" from the perspective of an engineer, as I am an engineer in the field."
  ]
}

Gradually, sentences that are judged to be insignificant are being deleted. I don't know what the criteria are, but After all, as a behavior,

Separate sentences with punctuation
Output the highest importance ○ lines (for the specified number of lines) of the text as a result

There seems to be no mistake in the form.

Looking at the summary results, I personally think that the three-line summary is the easiest and most relevant. However, as the amount of text increases, I feel that I don't know what the story is with just three lines. I also feel that it is necessary to find an appropriate number of lines setting according to the amount of text.

Summarize disorganized sentences

To verify what would happen if you summarized the unconnected sentences Let's summarize the "table of contents" of the above article.

The fact that sentences cannot be separated without kuten has been proven above, so Convert all line feed codes to Kuten "." And execute.

Original

Reference book
Premise
1.People can change
1-1.There is no trauma
1-2.Don't be afraid to get hurt
1-3.Harm that occurs when the feeling of inferiority becomes too strong
1-4.Accept self
2.Separation of issues
2-1.You don't have to meet the expectations of others
2-2.Don't step into the challenges of others
2-3.Separation of issues
3.How to interact with others
3-1.Do not compete with others
3-2.Admit non-defeat = not lose
4.About raising people
4-1 Don't be scolded, don't praise
4-2 Thank you, not evaluate
5.Community sense
Finally

Execution result (3 lines)

$ python ./summpy_test.py
{
  "debug_info": {},
  "summary": [
    "2-1.You don't have to meet the expectations of others.",
    "2-2.Don't step into the challenges of others.",
    "3.How to relate to others."
  ]
}

Originally, it is a sentence that does not have much context, so it is natural that the summary result is also disorganized, It is interesting that not only the major categories were selected, but the middle categories (2-1 and 2-2) were selected.

Let's take a look inside the API

I was wondering what kind of logic was summarized, so I took a quick look at the source code of the API published on GitHub.

Perhaps the part that summarizes (the part that corresponds to the core logic) is as follows, https://github.com/recruit-tech/summpy/blob/master/summpy/lexrank.py

I'm using DictVectorizer and pairwise_distances, so After separating sentences, feature extraction is performed, and the distance matrix of the feature is obtained. It looks like you're scoring the result ...

I didn't follow the sauce so tightly, so If you are familiar with it, I would appreciate it if you could comment.

Summary

--The text is not processed on the summpy side. To the last, separate the original text with punctuation marks, etc., and extract the sentences with high importance ――As long as the amount of text and the number of lines are balanced, it will be summarized properly (I don't know what you're saying ...) ――The summary of disorganized sentences is NG. For example, if there is bulleted information such as "table of contents" in the text, it seems that it is better to remove it.

At the end

Thank you for watching till the end.

"What happens if you summarize this sentence?" "I want you to summarize this sentence!"

If you have a request such as, I would be grateful if you could comment.

I would like to try it as much as possible.

[PYTHON] I tried to summarize various sentences using the automatic summarization API "summpy"

What is summpy

Verification environment

Installation

Server execution

Source code

summpy_test.py

Execution result

I tried with various sentences

First of all, summarize without thinking about anything

Original

Source code

summpy_test.py

Execution result

Try removing punctuation

Original

Execution result

Adjust the number of summary lines

Execution result

Execution result (7 lines)

Execution result (6 lines)

Execution result (5 lines)

Execution result (4 lines)

Execution result (3 lines)

Execution result (2 lines)

Execution result (1 line)

Summarize disorganized sentences

Original

Execution result (3 lines)

Let's take a look inside the API

Summary

At the end

`summpy_test.py`

`summpy_test.py`