Language processing 100 knocks 2015 "Chapter 6: Processing English texts" It is a record of 50th "sentence break" of .tohoku.ac.jp/nlp100/#ch6). Compared to the difficult 49, it's very easy and it feels like a short break. Separate statements using regular expressions.
| Link | Remarks | 
|---|---|
| 050.Sentence break.ipynb | Answer program GitHub link | 
| 100 amateur language processing knocks:50 | Copy and paste source of many source parts | 
| type | version | Contents | 
|---|---|---|
| OS | Ubuntu18.04.01 LTS | It is running virtually | 
| pyenv | 1.2.16 | I use pyenv because I sometimes use multiple Python environments | 
| Python | 3.8.1 | python3 on pyenv.8.I'm using 1 Packages are managed using venv | 
An overview of various basic technologies of natural language processing through English text processing using Stanford Core NLP.
Stanford Core NLP, Stemming, Part-of-speech tagging, Named entity recognition, Co-reference analysis, Parsing analysis, Phrase structure analysis, S-expressions
For the English text (nlp.txt), execute the following processing.
(. Or; or: or? Or!) → Whitespace characters → Consider the pattern of uppercase letters as sentence delimiters, and output the input document in the form of one sentence per line.
import re
with open('./nlp.txt') as file_in, \
     open('./050.result.txt', 'w') as file_out:
    for line in file_in:
        if line != '\n':
            line = re.sub(r'''
                         (?<=[\.|;|:|\?|!]) #With affirmative look-behind. or ; or : or ? or !
                         \s                 #Blank(Replacement target for line breaks)
                         (?=[A-Z])          #Uppercase letters with affirmative look-ahead
                       ''', '\n', line, flags = re.VERBOSE)
            print(line.rstrip(), file=file_out)
This time we're using affirmative look-ahead and look-behind assertions in regular expressions. Although it is not included in the match target (replacement target this time), it is used as a search condition. For more information, see ["Basics and Tips for Python Regular Expressions Learned from Zero"](https://qiita.com/FukuharaYohei/items/459f27f0d7bbba551af7#%E5%85%88%E8%AA%AD%E3%81%BF % E5% BE% 8C% E8% AA% AD% E3% 81% BF% E3% 82% A2% E3% 82% B5% E3% 83% BC% E3% 82% B7% E3% 83% A7% E3 Please refer to% 83% B3).
When the program is executed, the following results (only the first 10 lines) are output.
text:050.result.txt(Only the first 10 lines)
Natural language processing
From Wikipedia, the free encyclopedia
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages.
As such, NLP is related to the area of humani-computer interaction.
Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.
History
The history of NLP generally starts in the 1950s, although work can be found from earlier periods.
In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English.
The authors claimed that within three or five years, machine translation would be a solved problem.
Recommended Posts