[PYTHON] Convert sentences to vectors with gensim

I tried the Chapter From Strings to Vectors.

The stoplist part excludes unnecessary words.

What is a stop word Words that have to be excluded from the search target in order to improve the search accuracy because it takes too many searches. Function words such as particles and auxiliary verbs (such as "ha", "no", "desu", "masu" in Japanese, and "the", "of", "is" in English) are almost always applicable. ..

sample.py



import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

from gensim import corpora, models, similarities

documents = ["Human machine interface for lab abc computer applications",
          "A survey of user opinion of computer system response time",
          "The EPS user interface management system",
          "System and human system engineering testing of EPS",
          "Relation of user perceived response time to error measurement",
          "The generation of random binary unordered trees",
          "The intersection graph of paths in trees",
          "Graph minors IV Widths of trees and well quasi ordering",
          "Graph minors A survey"]

          
# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
  for document in documents]

# remove words that appear only once
from collections import defaultdict
frequency = defaultdict(int)

# print(texts)

for text in texts:
	for token in text:
 		frequency[token] += 1

texts = [[token for token in text if frequency[token] > 1]
for text in texts]

# from pprint import pprint   # pretty-printer
# pprint(texts)

dictionary = corpora.Dictionary(texts)
# print(dictionary)

#Output with id
# print(dictionary.token2id)

#Convert to sentence vector
corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus)


Official tutorial https://radimrehurek.com/gensim/tut1.html

Recommended Posts

Convert sentences to vectors with gensim
Convert 202003 to 2020-03 with pandas
Convert .ipynb to .html (with BatchFile)
Convert list to DataFrame with python
Convert PDF to image with ImageMagick
Convert memo at once with Python 2to3
Convert from PDF to CSV with pdfplumber
Convert character strings to features with RoBERTa
Convert Excel data to JSON with python
Convert Hiragana to Romaji with Python (Beta)
Convert FX 1-minute data to 5-minute data with Python
Convert PDF files to PNG files with GIMP
Convert array (struct) to json with golang
Convert HEIC files to PNG files with Python
Convert Chinese numerals to Arabic numerals with Python
Sample to convert image to Wavelet with Python
Convert to HSV
Convert DICOM to PNG with Ascending and Descending
Convert data with shape (number of data, 1) to (number of data,) with numpy.
Convert PDF to image (JPEG / PNG) with Python
Convert PDFs to images in bulk with Python
Convert mp4 to mp3 with ffmpeg (thumbnail embedded version)
Convert svg file to png / ico with Python
Convert Windows epoch values to date with python
Easily convert Jupyter Notebooks to blogs with fastpages
How to convert (32,32,3) to 4D tensor (1,32,32,1) with ndarray type
Convert strings to character-by-character list format with python
I want to convert an image to WebP with lollipop
0 Convert unfilled date to datetime type with regular expression
Convert kanji to kana
Convert a text file with hexadecimal values to a binary file
How to convert horizontally held data to vertically held data with pandas
How to convert a class object to a dictionary with SQLAlchemy
Easy generation of stylistic pakuri sentences with MeCab + gensim
Convert jupyter to py
Convert keras-yolo3 to onnx
Convert the image in .zip to PDF with Python
How to convert JSON file to CSV file with Python Pandas
Convert dict to array
PyInstaller memorandum Convert Python [.py] to [.exe] with 2 lines
Convert json to excel
Convert numeric variables to categorical with thresholds in pandas
Convert Select query obtained from Postgre with Go to JSON
Convert images to sepia toning with PIL (Python Imaging Library)
Convert garbled scanned images to PDF with Pillow and PyPDF
I tried machine learning to convert sentences into XX style
Convert video to black and white with ffmpeg + python + opencv
Try to factorial with recursion
Connect to BigQuery with Python
Convert hexadecimal string to binary
[python] Convert date to string
[gensim] How to use Doc2Vec
Convert numpy int64 to python int
Convert HTML to text file
Connect to Wikipedia with Python
Post to slack with Python 3
Connect to Postgresql with GO
Output to syslog with Loguru
Introduction to RDB with sqlalchemy Ⅰ
How to update with SQLAlchemy?
To run gym_torcs with ubutnu16