Create a word frequency counter with Python 3.4

Thing you want to do

Video # 1, [Video # 2](https://www.youtube.com/watch?v=up5Xehmtn2E&index=36 = PL6gx4Cwl9DGAcbMi1sH6oAMk4JHw91mC_) and Video # 3 What I want you to do is go to the specified site and display the words used in each heading link in descending order of frequency.

Video # 1

import requests
from bs4 import BeautifulSoup
import operator


def start(url):
	word_list = []
	source_code = requests.get(url).text #gonna connect to the link and use it as plain text
	soup = BeautifulSoup(source_code, 'html.parser')
	for post_text in soup.findAll('a', {'class': 'title text-semibold'}): #go through all the contents
		content = post_text.string #.string = only get the texts thats inside "soup"
		words = content.lower().split()
		for each_word in words:
			print(each_word)
			word_list.append(each_word)
		

start("https://www.thenewboston.com/forum/")
  1. Make a word_list list (to throw all the words you've broken up later here)
  2. Go to the site and save the html text in source_code
  3. Use Beautiful Soup to style
  4. Narrow down the necessary parts with the CSS selector and extract only the text in it to content
  5. Make all the contents of content lowercase, separate each space and throw it into words
  6. Use loop to throw each word in words into word_list I felt like saying.

Output

dictionary
print
order
permanent
display
of
content
rendering
problems
whenever
i
start
the
android
studio
two
beginner
python
courses?
vector
about
double
buffering
arduino
code
asterisk
before
a
pointer
can
you
provide
me
the
arduino
code
for
eye
blinking
sensor(ir
sensor)
for
accidental
prevention.
can't
import
images
in
android
studio
can't
install
intel
haxm
free
internet
javascript
interpreter
lambda
function
my
funny
litlte
program
navigation
drawer
activity
not
able
to
find
the
problem
need
help
org.apache.http.client.httpclient
deprecated
question
about
themes
someone
share
a
link
to
source
codes??
source
code
?
which
all
views
should
be
turned
on?
x86
emulation
error
error
when
trying
to
build
and
run.
computer
doesn't
support
virtualization.
web
development
using
html
java
game
about
getting
user
input
eclipse
doesn't
recognise
my
imports
other
ways
of
styling

Video # 2

import requests
from bs4 import BeautifulSoup
import operator


def start(url):
	word_list = []
	source_code = requests.get(url).text #gonna connect to the link and use it as plain text
	soup = BeautifulSoup(source_code, 'html.parser')
	for post_text in soup.findAll('a', {'class': 'title text-semibold'}): #go through all the contents
		content = post_text.string #.string = only get the texts thats inside "soup"
		words = content.lower().split()
		for each_word in words:
			word_list.append(each_word)
	clean_up_list(word_list)

def clean_up_list(word_list):
	clean_word_list = []
	for word in word_list:
		symbols = "!@#$%^&*()_+{}:\"<>?,./;'[]-="
		for i in range(0, len(symbols)): 
			word = word.replace(symbols[i], "") #and replace it with nothing (=delete) if finds any symbols
		if len(word) > 0: #allows it to take only the actual clean words
			#print(word)
			clean_word_list.append(word)

start("https://www.thenewboston.com/forum/")

In the start function, we even fetched the characters and put them in the word_list, but this time we will create a function to sort out the fetched words. For example, dealing with symbols other than words and letters with just spaces.

  1. Create clean_word_list first
  2. word_list that contains the word fetched by the start function for loop that loops each word (= word)
  3. Determine whether to match each word for each symbol. If so, replace it with a blank
  4. If the length of word is greater than 0 (= not just a blank), add it to clean_word_list

Output

variables
in
enum
dictionary
print
order
permanent
display
of
content
rendering
problems
whenever
i
start
the
android
studio
two
beginner
python
courses
vector
about
double
buffering
arduino
code
asterisk
before
a
pointer
can
you
provide
me
the
arduino
code
for
eye
blinking
sensorir
sensor
for
accidental
prevention
cant
import
images
in
android
studio
cant
install
intel
haxm
free
internet
javascript
interpreter
lambda
function
my
funny
litlte
program
navigation
drawer
activity
not
able
to
find
the
problem
need
help
orgapachehttpclienthttpclient
deprecated
question
about
themes
someone
share
a
link
to
source
codes
source
code
which
all
views
should
be
turned
on
x86
emulation
error
error
when
trying
to
build
and
run
computer
doesnt
support
virtualization
web
development
using
html
java
game
about
getting
user
input
eclipse
doesnt
recognise
my
imports

Video # 3

import requests
from bs4 import BeautifulSoup
import operator #allows you to work with build-in data types in python


def start(url):
	word_list = []
	source_code = requests.get(url).text #gonna connect to the link and use it as plain text
	soup = BeautifulSoup(source_code, 'html.parser')
	for post_text in soup.findAll('a', {'class': 'title text-semibold'}): #go through all the contents
		content = post_text.string #.string = only get the texts thats inside "soup"
		words = content.lower().split()
		for each_word in words:
			word_list.append(each_word)
	clean_up_list(word_list)

def clean_up_list(word_list):
	clean_word_list = []
	for word in word_list:
		symbols = "!@#$%^&*()_+{}:\"<>?,./;'[]-="
		for i in range(0, len(symbols)): 
			word = word.replace(symbols[i], "") #and replace it with nothing (=delete) if finds any symbols
		if len(word) > 0: #allows it to take only the actual clean words
			print(word)
			clean_word_list.append(word)
	create_dictionary(clean_word_list)

def create_dictionary(clean_word_list):
	word_count = {}
	for word in clean_word_list:
		if word in word_count:
			word_count[word] += 1 # word_count[word]Number is incremented by one
		else:
			word_count[word] = 1
	for key, value in sorted(word_count.items(), key = operator.itemgetter(1)):
	#go to the dic. and get an item from the dic.
	# key = 0 and value = 1, so if you wanted to sort by key then operator.itemgetter(0) = alphabetical order
	
		print(key, value)

start("https://www.thenewboston.com/forum/")

Create a create_dictionary function that can save a word as a key with the frequency of use of the word as value. If you already have it using the if syntax, create a new one if you don't want to add one point. Use for key, value in sorted (word_count.items (), key = operator.itemgetter (1)) to pull words from the dictionary and sort the words in descending order of value.

Output

variables
in
enum
dictionary
print
order
permanent
display
of
content
rendering
problems
whenever
i
start
the
android
studio
two
beginner
python
courses
vector
about
double
buffering
arduino
code
asterisk
before
a
pointer
can
you
provide
me
the
arduino
code
for
eye
blinking
sensorir
sensor
for
accidental
prevention
cant
import
images
in
android
studio
cant
install
intel
haxm
free
internet
javascript
interpreter
lambda
function
my
funny
litlte
program
navigation
drawer
activity
not
able
to
find
the
problem
need
help
orgapachehttpclienthttpclient
deprecated
question
about
themes
someone
share
a
link
to
source
codes
source
code
which
all
views
should
be
turned
on
x86
emulation
error
error
when
trying
to
build
and
run
computer
doesnt
support
virtualization
web
development
using
html
java
game
about
getting
user
input
eclipse
doesnt
recognise
my
imports
courses 1
images 1
order 1
litlte 1
i 1
link 1
variables 1
input 1
when 1
someone 1
pointer 1
vector 1
x86 1
buffering 1
on 1
of 1
blinking 1
recognise 1
beginner 1
enum 1
javascript 1
should 1
need 1
eclipse 1
computer 1
dictionary 1
virtualization 1
navigation 1
can 1
permanent 1
provide 1
prevention 1
print 1
function 1
game 1
internet 1
html 1
question 1
rendering 1
deprecated 1
you 1
turned 1
orgapachehttpclienthttpclient 1
find 1
haxm 1
activity 1
asterisk 1
using 1
which 1
intel 1
double 1
all 1
support 1
problem 1
two 1
funny 1
whenever 1
display 1
problems 1
sensor 1
accidental 1
java 1
interpreter 1
me 1
eye 1
help 1
before 1
imports 1
getting 1
development 1
trying 1
import 1
not 1
drawer 1
install 1
codes 1
views 1
be 1
user 1
share 1
themes 1
web 1
content 1
able 1
program 1
build 1
sensorir 1
python 1
emulation 1
and 1
start 1
run 1
lambda 1
free 1
in 2
for 2
android 2
arduino 2
cant 2
error 2
doesnt 2
studio 2
a 2
my 2
source 2
the 3
to 3
code 3
about 3

Postscript:

I received advice from @lazykyama that "If you use Counter of collections module, you will almost never need the function created in Video 3," so I decided to implement it immediately.

import requests
from bs4 import BeautifulSoup
import operator #allows you to work with build-in data types in python
from collections import Counter


def start(url):
	word_list = []
	source_code = requests.get(url).text #gonna connect to the link and use it as plain text
	soup = BeautifulSoup(source_code, 'html.parser')
	for post_text in soup.findAll('a', {'class': 'title text-semibold'}): #go through all the contents
		content = post_text.string #.string = only get the texts thats inside "soup"
		words = content.lower().split()
		for each_word in words:
			word_list.append(each_word)
	clean_up_list(word_list)

def clean_up_list(word_list):
	clean_word_list = []
	for word in word_list:
		symbols = "!@#$%^&*()_+{}:\"<>?,./;'[]-="
		for i in range(0, len(symbols)): 
			word = word.replace(symbols[i], "") #and replace it with nothing (=delete) if finds any symbols
		if len(word) > 0: #allows it to take only the actual clean words
			#print(word)
			clean_word_list.append(word)

	counts = Counter(clean_word_list)
	print(counts)

start("https://www.thenewboston.com/forum/")

Here is the output:

Counter({'the': 9, 'to': 5, 'i': 5, 'with': 5, 'program': 3, 'image': 3, 'code': 3, 'web': 3, 'help': 3, 'simple': 3, 'source': 3, 'crawler': 3, 'a': 3, 'in': 3, 'am': 2, 'not': 2, 'error': 2, 'cant': 2, 'is': 2, 'my': 2, 'images': 2, 'when': 2, 'getting': 2, 'tutorial': 2, 'about': 2, 'for': 2, 'need': 2, 'app': 2, 'problem': 2, 'android': 2, 'find': 2, 'and': 2, 'studio': 1, 'running': 1, 'clock': 1, 'selenium': 1, 'codes': 1, 'mergesort': 1, 'it': 1, 'trouble': 1, 'someone': 1, 'please': 1, 'webpage': 1, 'method': 1, 'beginners': 1, 'camera': 1, 'lambda': 1, 'specified': 1, 'build': 1, 'buying': 1, 'development': 1, 'dosent': 1, 'run': 1, 'of': 1, 'anything': 1, 'mac': 1, 'reference': 1, 'mistake': 1, 'linked': 1, 'haxm': 1, 'list': 1, 'now': 1, 'trying': 1, 'on': 1, 'typecasting': 1, 'got': 1, 'current': 1, 'imagemap': 1, 'question': 1, 'undefined': 1, 'assignment': 1, 'population': 1, 'import': 1, 'able': 1, 'apple': 1, 'system': 1, 'needs': 1, 'show': 1, 'prepaid': 1, 'install': 1, 'how': 1, 'cannot': 1, 'hover': 1, 'add': 1, 'video': 1, '4': 1, 'default': 1, 'involving': 1, 'inserting': 1, 'you': 1, 'only': 1, 'function': 1, 'file': 1, 'themes': 1, 'this': 1, '28': 1, 'chooser': 1, 'refresh': 1, 'share': 1, 'link': 1, 'where': 1, 'tagif': 1, 'tip': 1, 'practice': 1, 'python': 1, 'get': 1, 'visa': 1, 'environment': 1, 'funny': 1, 'possible': 1, '42': 1, 'css': 1, 'step': 1, 'bitcoins': 1, 'time': 1, 'which': 1, 'variable': 1, 'date': 1, 'litlte': 1, 'as': 1, 'override': 1, 'capture': 1, 'effect': 1, 'intel': 1, 'can': 1, 'but': 1, 'at': 1, 'bug': 1, 'onattach': 1, 'loop': 1, 'what': 1})

It's convenient because it makes you slim and puts it together in a dictionary.

In addition, it seems that it is possible to specify a specific word and display its frequency of use. For example, if you want to display the frequency of only the word the:

import requests
from bs4 import BeautifulSoup
import operator #allows you to work with build-in data types in python
from collections import Counter


def start(url):
	word_list = []
	source_code = requests.get(url).text #gonna connect to the link and use it as plain text
	soup = BeautifulSoup(source_code, 'html.parser')
	for post_text in soup.findAll('a', {'class': 'title text-semibold'}): #go through all the contents
		content = post_text.string #.string = only get the texts thats inside "soup"
		words = content.lower().split()
		for each_word in words:
			word_list.append(each_word)
	clean_up_list(word_list)

def clean_up_list(word_list):
	clean_word_list = []
	for word in word_list:
		symbols = "!@#$%^&*()_+{}:\"<>?,./;'[]-="
		for i in range(0, len(symbols)): 
			word = word.replace(symbols[i], "") #and replace it with nothing (=delete) if finds any symbols
		if len(word) > 0: #allows it to take only the actual clean words
			#print(word)
			clean_word_list.append(word)

	counts = Counter(clean_word_list)
	specific = counts["the"] #9
	print(specific)

start("https://www.thenewboston.com/forum/")

You can also change the frequency of words in the dictionary at will by using the count, which is not possible with the normal dictionary with counts [" the"] = 15. With counts ["the "] = 0, you can bring it to the end of the dictionary. It can also be deleted with del counts [1].

You can also create a list with x = list (counts.elements ()).

#Same as above, so omitted
	counts = Counter(clean_word_list)
	counts_list = list(counts.elements())
	print(counts_list)

start("https://www.thenewboston.com/forum/")
['please', 'problem', 'problem', 'add', 'crawler', 'crawler', 'crawler', 'running', 'specified', 'is', 'is', 'dosent', 'practice', 'intel', 'anything', 'show', 'mergesort', 'image', 'image', 'image', 'list', 'import', 'tip', 'loop', 'am', 'am', 'getting', 'getting', 'population', 'get', 'buying', 'for', 'for', 'about', 'about', 'which', '4', 'on', 'prepaid', 'mistake', 'override', 'got', 'function', 'share', 'as', 'clock', 'reference', 'cannot', 'bitcoins', 'effect', 'code', 'code', 'code', 'assignment', 'you', 'can', 'images', 'images', 'haxm', 'find', 'find', 'install', 'with', 'with', 'with', 'with', 'with', 'trying', 'file', 'and', 'and', 'what', 'android', 'android', 'typecasting', 'source', 'source', 'source', 'beginners', 'someone', 'possible', 'cant', 'cant', 'how', 'method', 'app', 'app', 'i', 'i', 'i', 'i', 'i', 'system', 'where', 'webpage', 'involving', 'funny', 'current', 'it', 'linked', 'in', 'in', 'in', 'variable', 'web', 'web', 'web', 'hover', 'litlte', 'question', 'tagif', 'time', 'inserting', 'trouble', 'program', 'program', 'program', 'bug', '42', 'tutorial', 'tutorial', 'need', 'need', 'video', 'lambda', 'date', 'chooser', 'run', 'error', 'error', 'default', 'to', 'to', 'to', 'to', 'to', 'of', 'apple', 'link', 'when', 'when', 'capture', 'mac', 'css', 'step', 'refresh', 'not', 'not', 'imagemap', 'development', 'camera', 'but', 'simple', 'simple', 'simple', 'needs', 'help', 'help', 'help', 'studio', 'a', 'a', 'a', '28', 'selenium', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'now', 'themes', 'environment', 'python', 'visa', 'only', 'this', 'able', 'undefined', 'onattach', 'build', 'at', 'my', 'my', 'codes']

most_frequent = counts.most_common (2) shows the top two most frequently used words most_frequent = counts.most_common (2) print (most_frequent [1]) shows the second most frequently used word

Recommended Posts

Create a word frequency counter with Python 3.4
Create a directory with python
Create a virtual environment with Python!
Create a Python function decorator with Class
Build a blockchain with Python ① Create a class
Create a dummy image with Python + PIL.
I made a character counter with Python
[Python] Create a virtual environment with Anaconda
Let's create a free group with Python
[Automation] Read a Word document with Python
Create an English word app with python
Create a Python module
Create a Python environment
Create a frame with transparent background with tkinter [Python]
Create a LINE BOT with Minette for Python
Create a virtual environment with conda in Python
Create a page that loads infinitely with python
[Note] Create a one-line timezone class with python
You can easily create a GUI with Python
Create a python3 build environment with Sublime Text3
Create a color bar with Python + Qt (PySide)
Steps to create a Twitter bot with python
Create a decision tree from 0 with Python (1. Overview)
Create a new page in confluence with Python
Create a color-specified widget with Python + Qt (PySide)
Create a Photoshop format file (.psd) with python
Create a Python console application easily with Click
Create a Wox plugin (Python)
Create a function in Python
Create 3d gif with python3
[Python] Create a ValueObject with a complete constructor using dataclasses
Create a homepage with django
Why not create a stylish table easily with Python?
Create a python numpy array
Make a fortune with Python
Create a heatmap with pyqtgraph
[python] Create a date array with arbitrary increments with np.arange
[Python] How to create a 2D histogram with Matplotlib
[Python] Create a Tkinter program distribution file with cx_Freeze
Create a fake Minecraft server in Python with Quarry
Create a company name extractor with python using JCLdic
Create a 2d CAD file ".dxf" with python [ezdxf]
[Python] Create a file & folder path specification screen with tkinter
Create a list in Python with all followers on twitter
Create a Mastodon bot with a function to automatically reply with Python
Create a child account for connect with Stripe in Python
Let's create a script that registers with Ideone.com in Python.
Probably the easiest way to create a pdf with Python3
Create a Twitter BOT with the GoogleAppEngine SDK for Python
Create a simple video analysis tool with python wxpython + openCV
Create a simple Python development environment with VSCode & Docker Desktop
Create a python machine learning model relearning mechanism with mlflow
Create a pixel art of Levi Captain with Python programming!
Create a message corresponding to localization with python translation string
[Python] Create a screen for HTTP status code 403/404/500 with Django
[Python] What is a with statement?
Solve ABC163 A ~ C with Python
Operate a receipt printer with python
Create Awaitable with Python / C API
Create a DI Container in Python
Let's make a GUI with python.