[PYTHON] 100 natural language processing knocks Chapter 1 Preparatory movement (second half)

A record of solving the problems in the second half of Chapter 1.

05. n-gram

Create a function that creates an n-gram from a given sequence (string, list, etc.). Use this function to get the word bi-gram and the letter bi-gram from the sentence "I am an NLPer".

# -*- coding: utf-8 -*-
__author__ = 'todoroki'

def ngram(data, n):
    res = []
    for i in xrange(len(data) - 1):
        res.append(data[i:i + n])
    return res

string = 'I am an NLPer'
print u'Character list bi-gram:'
print ngram(string.split(), 2)
print u'String bi-gram:'
print ngram(string, 2)
#=>Character list bi-gram:
#=> [['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
#=>String bi-gram:
#=> ['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

For the bi-gram of the character string, spaces are also treated as one character.

[PYTHON] 100 natural language processing knocks Chapter 1 Preparatory movement (second half)

</ i> 06. Assembly

</ i> 07. Sentence generation by template

</ i> 08. Ciphertext