[PYTHON] Get the average salary of a job with specified conditions from indeed.com

Indeed.com has an item that shows the salary level and the number of cases. Use this item to calculate the average salary.

code

avgsalary.py


import urllib.request, urllib.parse
from bs4 import BeautifulSoup
import re, getopt, sys
import numpy as np


def avgsalary(query, location):
    query = urllib.parse.quote_plus(query)
    location = urllib.parse.quote_plus(location)
    url = "https://jp.indeed.com/%E6%B1%82%E4%BA%BA?q={}&l={}&radius=0".format(query, location)
            
    request = urllib.request.urlopen(url);
    soup = BeautifulSoup(request.read(), 'html.parser')
    result = soup.find(id="SALARY_rbo")
    results = result.find_all("li")
    salaries = []
    num_salaries = []
    for result in results:
        tmp  = result.a["title"]
        tmp = re.sub(',','', tmp)
        tmp = re.sub(r'([0-9]+)[^\d]+([0-9]+).*$', r'\1,\2', tmp);
        tmp = tmp.split(",")
        salaries.append(tmp[0])
        num_salaries.append(tmp[1])
    salaries = np.array(salaries).astype(np.float)
    salaries *= 10000
    num_salaries = np.array(num_salaries).astype(np.float)
    return(np.sum(salaries * num_salaries)/np.sum(num_salaries))

def main():

    try:  
        opts, args = getopt.getopt(sys.argv[1:],"q:l:", ["query", "location"]);
    except getout.GetoptError as err:
        #usage()
        sys.exit(2)

    query = ""
    location = ""
    for o, a in opts:
        if o == "-q":
            query = a
        elif o == "-l":
            location = a

    print(avgsalary(query, location))

if __name__ == "__main__":
    main()

Run

$ python avgsalary.py -l Gotemba
2312722.94887

Description

This code does the following:

  1. Get salary and number of cases using Beautiful Soup.
  2. Store salary and number in numpy array.
  3. Calculate the average. The output is the annual income.

Specific example

For example, the following comparison is interesting.

$ python avgsalary.py -q programmer
4469298.24561
$ python avgsalary.py -q programmer
3116876.47306

This comparison generally means the difference in annual income between "English jobs" and "Japanese jobs". Considering that English job vacancies have a higher annual income of more than 1 million, we can see how important English is. By the way, if you use the US version of indeed.com, you can see that the average salary of American programmers is over 7 million yen.

Recommended Posts

Get the average salary of a job with specified conditions from indeed.com
A python script that gets the number of jobs for a specified condition from indeed.com
Get the value of a specific key in a list from the dictionary type in the list with Python
Get the id of a GPU with low memory usage
Get UNIXTIME at the beginning of today with a command
[Python] Get the update date of a news article from HTML
Get the URL of a JIRA ticket created with the jira-python library
Tips: [Python] Calculate the average value of the specified area with bedgraph
Get OCTA simulation conditions from a file and save with pandas
Get the last day of the specified month
Get the filename of a directory (glob)
Extract lines that match the conditions from a text file with python
Get the list of packages for the specified user from the packages registered on PyPI
Get the contents of git diff from python
Get the caller of a function in Python
Get a list of IAM users with Boto3
SSH login to the target server from Windows with a click of a shortcut
Get the stock price of a Japanese company with Python and make a graph
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
I want a Slack bot that calculates and tells me the salary of a part-time job from the schedule of Google Calendar!
Get the nth smallest number from the array with O (logN) using a segment tree
Get the variable name of the variable as a character string.
Calculate volume from the two-dimensional structure of a compound
Learn Nim with Python (from the beginning of the year).
[Python] Get the text of the law from the e-GOV Law API
Get the sum of each of multiple columns with awk
Take a screenshot of the LCD with Python-LEGO Mindstorms
Get the return code of the Python script from bat
Python points from the perspective of a C programmer
Try to get the contents of Word with Golang
Visualize the characteristic vocabulary of a document with D3.js
Get the operation status of JR West with Python
Calculate the product of matrices with a character expression?
Get the value of a specific key up to the specified index in the dictionary list in Python
Get the trading price of virtual currency and create a chart with API of Zaif exchange
A network diagram was created with the data of COVID-19.
Measure the importance of features with a random forest tool
Access the file with a relative path from the execution script.
Different from the import type of python. from A import B meaning
Get the package version to register with PyPI from Git
The story of a Django model field disappearing from a class
Get the number of specific elements in a python list
Get a list of purchased DMM eBooks with Python + Selenium
Create a correlation diagram from the conversation history of twitter
How to get a list of links from a page from wikipedia
Analyze the topic model of becoming a novelist with GensimPy3
Make a BLE thermometer and get the temperature with Pythonista3
The story of making a question box bot with discord.py
Get the host name of the host PC with Docker on Linux
Get the source of the page to load infinitely with python.
A story about predicting prefectures from the names of cities, wards, towns and villages with Jubatus
Process the contents of the file in order with a shell script
A story stuck with the installation of the machine learning library JAX
Get the number of PVs of Qiita articles you posted with API
Save the result of the life game as a gif with python
Find the optimal value of a function with a genetic algorithm (Part 2)
[Statistics] Grasp the image of the central limit theorem with a graph
[python, ruby] fetch the contents of a web page with selenium-webdriver
How to get the ID of Type2Tag NXP NTAG213 with nfcpy
A formula that simply calculates the age from the date of birth