A python script that gets the number of jobs for a specified condition from indeed.com

I wanted to output statistics on the number of job vacancies by region and job vacancies in job change activities, so I created a script to use for the statistics.

Overview

Indeed, send a query and region to .com, extract the number of search results from the received results, and display them. Use urllib, re and bs4.

code

jobcounter.py


import urllib.request, urllib.parse
from bs4 import BeautifulSoup
import re, getopt, sys

def jobcounter(query, location):
    query = urllib.parse.quote_plus(query)
    location = urllib.parse.quote_plus(location)
    url = "https://jp.indeed.com/%E6%B1%82%E4%BA%BA?q={}&l={}&radius=0".format(query, location)
            
    request = urllib.request.urlopen(url);
    soup = BeautifulSoup(request.read(), 'html.parser')
    result = soup.find_all(id="searchCount")[0].get_text()
    result = result.replace(",", "");
    result = re.sub(r'Job search results([0-9]+) .*$', r'\1', result);
    return(result)

def main():

    try:  
        opts, args = getopt.getopt(sys.argv[1:],"q:l:", ["query", "location"]);
    except getout.GetoptError as err:
        #usage()
        sys.exit(2)

    query = ""
    location = ""
    for o, a in opts:
        if o == "-q":
            query = a
        elif o == "-l":
            location = a

    print(jobcounter(query, location))

if __name__ == "__main__":
    main()

Try from CLI

Execute the following command.

$ python jobcounter.py -q programmer-l Shibuya

The execution result is as follows.

result.


1740

This result means that "1740" were found as a result of searching for jobs including "programmer" in the area "Shibuya".

How to use jobcounter

The main uses are to obtain statistics such as "how many jobs are available for each occupation in a specific area" and "how many jobs are available for each occupation in a specific area". Can be used.

jobcounter(query, location)

I made an easy-to-understand function, so all you have to do is pass the query and region in a loop with an array or yaml. The return value is the number of cases.

important point

urllib and re should be included originally, but bs4 needs to be included with pip.

# pip install bs4

Also, if you change the appearance, wording or html on indeed.com, it may get stuck. Specifically, there is an html element defined by the id "searchCount", but if this id name is changed, it cannot be obtained. Alternatively, since the text in searchCount is formatted with re, it will not be formatted properly if the text does not match the regular expression.

Web scraping and Unix philosophy

Web scraping is the extraction of information from a website, and this script is also a type of web scraping. There is a UNIX philosophy of "doing one thing well", and the above script is generally based on this idea.

It doesn't have a spectacular feature, but it's good enough as a function to get job vacancies statistics. The script itself is not complicated and anyone can understand it.

Recommended Posts

A python script that gets the number of jobs for a specified condition from indeed.com
[Python] A program that counts the number of valleys
A Python script that compares the contents of two directories
[Python] Programming to find the number of a in a character string that repeats a specified number of times.
A Python script that allows you to check the status of the server from your browser
A script that returns 0, 1 attached to the first Python prime number
[Python] A program that calculates the number of chocolate segments that meet the conditions
[Python] A program that calculates the number of socks to be paired
Get the average salary of a job with specified conditions from indeed.com
From a book that makes the programmer's way of thinking interesting (Python)
"Python Kit" that calls a Python script from Swift
A python script that draws a band diagram from the VASP output file EIGENVAL
[Python] A program that calculates the number of updates of the highest and lowest records
Python script to get a list of input examples for the AtCoder contest
Get the return code of the Python script from bat
Python points from the perspective of a C programmer
[Python] A program that compares the positions of kangaroos.
[Python] A program that finds the shortest number of steps in a game that crosses clouds
A script that can perform stress tests according to the number of CPU cores
[Python] Representing the number of complaints from life insurance companies in a bar graph
Different from the import type of python. from A import B meaning
python Condition extraction from a list that I often forget
Get the number of specific elements in a python list
Creating a Python script that supports the e-Stat API (ver.2)
Consideration for Python decorators of the type that passes variables
A set of script files that do wordcloud in Python3
A library that monitors the life and death of other machines by pinging from Python
A Python script that goes from Google search to saving the Search results page at once
Let Code Day72 Starting from Zero "1498. Number of Subsequences That Satisfy the Given Sum Condition"
[Python] A program to find the number of apples and oranges that can be harvested
A memorandum of understanding for the Python package management tool ez_setup
Aggregate the number of hits per second for one day from the web server log with Python
A formula that simply calculates the age from the date of birth
A story that struggled to handle the Python package of PocketSphinx
Existence from the viewpoint of Python
From a book that programmers can learn (Python): Find the mode
From a book that programmers can learn ... (Python): Review of arrays
The story of making a standard driver for db with python.
[Python] Get the update date of a news article from HTML
A function that measures the processing time of a method in python
A story that is a little addicted to the authority of the directory specified by expdp (for beginners)
[Python] A program that finds the maximum number of toys that can be purchased with your money
[python] A note that started to understand the behavior of matplotlib.pyplot
The story of making a module that skips mail with python
[Python] A program that rotates the contents of the list to the left
Get the number of readers of a treatise on Mendeley in Python
Create a bot that posts the number of people positive for the new coronavirus in Tokyo to Slack
A story about creating a program that will increase the number of Instagram followers from 0 to 700 in a week
ETL processing for a large number of GTFS Realtime files (Python edition)
Get the number of searches with a regular expression. SeleniumBasic VBA Python
Extract lines that match the conditions from a text file with python
Check the in-memory bytes of a floating point number float in Python
Test & Debug Tips: Create a file of the specified size in Python
A summary of Python e-books that are useful for free-to-read data analysis
Get the list of packages for the specified user from the packages registered on PyPI
A python script for Mac that zips without garbled characters on Windows
Find out the name of the method that called it from the method that is python
[Python] Note: A self-made function that finds the area of the normal distribution
Learning notes from the beginning of Python 1
Run the Python interpreter in a script
[python] [meta] Is the type of python a type?