[PYTHON] 100 Language Processing Knock (2020): 38

"""
37.Top 10 words that frequently co-occur with "cat"
Display 10 words that often co-occur with "cat" (high frequency of co-occurrence) and their frequency of appearance in a graph (for example, a bar graph).


sentence_list:
[[{'surface': '', 'base': '*', 'pos': 'BOS/EOS', 'pos1': '*'},
  {'surface': 'one', 'base': 'one', 'pos': 'noun', 'pos1': 'number'},
  {'surface': '', 'base': '*', 'pos': 'BOS/EOS', 'pos1': '*'}],
 [{'surface': '', 'base': '*', 'pos': 'BOS/EOS', 'pos1': '*'},
  {'surface': 'I', 'base': 'I', 'pos': 'noun', 'pos1': '代noun'},
  {'surface': 'Is', 'base': 'Is', 'pos': 'Particle', 'pos1': '係Particle'},
  {'surface': 'Cat', 'base': 'Cat', 'pos': 'noun', 'pos1': 'General'},
  {'surface': 'so', 'base': 'Is', 'pos': 'Auxiliary verb', 'pos1': '*'},
  {'surface': 'is there', 'base': 'is there', 'pos': 'Auxiliary verb', 'pos1': '*'},
  {'surface': '。', 'base': '。', 'pos': 'symbol', 'pos1': 'Kuten'},
  {'surface': '', 'base': '*', 'pos': 'BOS/EOS', 'pos1': '*'}],

Memo:
    -Co-occurrence frequency: https://www.jtp.co.jp/techport/2018-04-18-001/
"""
from collections import defaultdict
from typing import List

import matplotlib.pyplot as plt

import utils

plt.style.use("ggplot")
plt.rcParams["font.family"] = "Hiragino Mincho ProN"  #Japanese support


def get_co_occurrence(sentence_list: List[List[dict]]) -> list:
    sents = [
        [word["surface"] for word in sent[1:-1]] for sent in sentence_list
    ]  # [['one'], ['I', 'Is', 'Cat', 'so', 'is there', '。']]
    counter = defaultdict(int)

    for sent in sents:
        if "Cat" in sent:
            for word in sent:
                counter[word] += 1

    del counter["Cat"]

    sorted_counter = {
        k: v for k, v in sorted(counter.items(), key=lambda item: item[1], reverse=True)
    }
    return list(sorted_counter.items())


def plot_co_occurrence(x: list, y: list) -> None:
    x_pos = [i for i, _ in enumerate(x)]

    plt.bar(x, y)
    plt.xlabel("Term")
    plt.ylabel("Frequency")
    plt.title("Co-occurrence with 'Cat'")

    plt.xticks(x_pos, x)

    plt.show()


sentence_list = utils.read_json("30_neko_mecab.json")
counter = get_co_occurrence(sentence_list)
# [('of', 391), ('Is', 272), ('、', 252), ('To', 250), ('To', 232)]

x = [word[0] for word in counter[:10]]
y = [word[1] for word in counter[:10]]
plot_co_occurrence(x, y)
# ![image-20200527193140109](https://raw.githubusercontent.com/LearnXu/images/master/imgs/image-20200527193140109.png)

Recommended Posts

100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
100 language processing knock 2020 [00 ~ 39 answer]
100 language processing knock 2020 [00-79 answer]
100 language processing knock 2020 [00 ~ 69 answer]
100 Language Processing Knock 2020 Chapter 1
100 Amateur Language Processing Knock: 17
100 Language Processing Knock-52: Stemming
100 Language Processing Knock Chapter 1
100 Amateur Language Processing Knock: 07
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
100 Language Processing with Python Knock 2015
100 Language Processing Knock-51: Word Clipping
100 Language Processing Knock-57: Dependency Analysis
100 language processing knock-50: sentence break
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock-25: Template Extraction
100 Language Processing Knock-87: Word Similarity
I tried 100 language processing knock 2020
100 language processing knock-56: co-reference analysis
Solving 100 Language Processing Knock 2020 (01. "Patatokukashi")
100 Amateur Language Processing Knock: Summary
100 language processing knocks 03 ~ 05
100 language processing knocks (2020): 40
100 language processing knocks (2020): 32
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 language processing knocks (2020): 35
100 language processing knocks (2020): 47
100 language processing knocks (2020): 39
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock Chapter 1 in Python
100 language processing knocks (2020): 22
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 language processing knocks (2020): 42
100 language processing knock-76 (using scikit-learn): labeling
100 language processing knocks (2020): 29
100 language processing knocks (2020): 49
100 Language Processing Knock with Python (Chapter 3)
100 language processing knocks (2020): 45
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock Chapter 4: Morphological Analysis
100 language processing knocks (2020): 10-19
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 language processing knocks (2020): 30
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 language processing knocks (2020): 00-09
100 Language Processing Knock-28: MediaWiki Markup Removal
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock 2020 Chapter 8: Neural Net
Python beginner tried 100 language processing knock 2015 (05 ~ 09)
100 Language Processing Knock-31 (using pandas): Verb