[PYTHON] Since the work case page is not updated, we will visualize the number of cases and encourage the update.

Whenever an engineer job seeker visits our website, he will stop by, [Job matter page] (https://ritsuan.com/job/) The update frequency here is very low, and even old projects are covered with dust and are not maintained. It's no good ... I have to do something quickly ...

While thinking about that, I found something like this when I was wandering around the internet for a while.

[Visualization of data by prefecture] (https://qiita.com/SaitoTsutomu/items/6d17889ba47357e44131)

Oh, I was inspired.

** Let's visualize the number of projects by prefecture on the job project page and encourage updates! ** </ font> It seems that the scope of business has expanded recently, and the number of projects around major cities all over Japan will be large! (Pretending to be in front)

Inspiration

  • Scraping job information by prefecture name
  • List the number of projects by prefecture
  • Coropress map display on japanmap according to the number of projects
  • If you don't post to Qiita, it will be forgotten (seriously)

environment

  • Windows10
  • python : 3.6.3
  • japanmap : 0.0.21
  • requests : 2.22.0
  • beautifulsoup4 : 4.7.1
  • matplotlib : 3.1.3
  • pandas : 0.25.3

Visualization of the number of cases

Here is the one that was completed by trial and error.

  • Since it is displayed on the jupyter notebook, there is some unnecessary code in the article. If there are any improvements, I would like all seniors to point out.

Python3


import requests
from bs4 import BeautifulSoup
from japanmap import pref_names, picture
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

plis=[]
for y in range(47):
     plis.append(0)

for z in range(50):
    r = requests.get('https://ritsuan.com/job/page/{}'.format(z+1))
    bs = BeautifulSoup(r.text, 'html.parser')
   
    for i in range(47):
        pnlist = pref_names[i+1]

        for j in bs.select("div[class=main_container] h2"):
            pc = j.text
            if pnlist in pc:
                plis[i] += 1
            
dic={}
for k in range(47):
    pdic = pref_names[k+1]
    dic[pdic] = plis[k]
   
cmap = plt.get_cmap("Reds")
df = pd.DataFrame.from_dict(dic, orient="index", columns=["Number of cases"])
norm = plt.Normalize(vmin=df.min(), vmax=df.max())
fcol = lambda x: '#' + bytes(cmap(norm(x), bytes=True)[:3]).hex()
plt.rcParams['figure.figsize'] = 20, 20
plt.colorbar(plt.cm.ScalarMappable(norm, cmap))
plt.imshow(picture(df.Number of cases.apply(fcol)))

result

This happens. <img width=400" src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/472694/355a928c-4c83-cbea-4acb-0f6d7be07b3e.png "> What is this? Only the Kanto-Tokai area is extremely dark. I expected this to happen as long as the head office is in Shizuoka prefecture, but I think the scope of business has expanded further. Well, because some areas are prominent, other areas are buried and look thin.

Confirmation immediately. Let's list it.

Python3


df = pd.DataFrame.from_dict(dic, orient="index", columns=["Number of cases"])
df.to_excel("job_list.xlsx", sheet_name="job_list")
Prefectures Number of cases Prefectures Number of cases Prefectures Number of cases
Hokkaido 0 Ishikawa Prefecture 0 Okayama Prefecture 0
Aomori Prefecture 1 Fukui prefecture 0 Hiroshima Prefecture 2
Iwate Prefecture 0 Yamanashi Prefecture 2 Yamaguchi Prefecture 0
Miyagi Prefecture 1 Nagano Prefecture 0 Tokushima Prefecture 0
Akita 0 Gifu Prefecture 3 Kagawa Prefecture 0
Yamagata Prefecture 0 Shizuoka Prefecture 62 Ehime Prefecture 0
Fukushima Prefecture 0 Aichi prefecture 90 Kochi Prefecture 0
Ibaraki Prefecture 12 Mie Prefecture 7 Fukuoka Prefecture 2
Tochigi Prefecture 1 Shiga Prefecture 4 Saga Prefecture 0
Gunma Prefecture 7 Kyoto 0 Nagasaki Prefecture 0
Saitama 9 Osaka 3 Kumamoto Prefecture 0
Chiba 2 Hyogo prefecture 1 Oita Prefecture 1
Tokyo 133 Nara Prefecture 0 Miyazaki prefecture 0
Kanagawa Prefecture 60 Wakayama Prefecture 0 Kagoshima prefecture 0
Niigata Prefecture 0 Tottori prefecture 0 Okinawa Prefecture 2
Toyama Prefecture 0 Shimane Prefecture 0

oh... There are areas where the number of projects is in the single digits, but does this really exist? If a job seeker in the area sees it, that line is likely to be heard. "Wow ... there are too few projects in my area ...?"

With this, even if it is visualized with a choropleth diagram, it is almost not displayed.

By the way, it was supposed that the number of projects would be large in major cities all over Japan, but it was surprising that the result was significantly overturned. (Recovery) I heard that there is a business office in Kansai, but it seems that it has not been updated. Now that we have the evidence (?), We can fuel the update.

What was unexpected

This time, I am searching for the part of [○○ prefecture ~] in the job case list, but there is a pattern in which the company name and multiple area names are described, and it is not counted as the number of searches. Such an inconsistent description method would not be accepted by the public. ~~ I can't search, so ~~ I should fix it immediately. In addition, I tried to search in the "address" column, but I did not do it because it seemed to take time from the description content of html.

What you want to adjust in the future

  • Industry bias by region
  • Job case information final page recognition and end

Finally

There are many types of matplotlib colormaps, and I have tried all of them this time, so I will post them here. It's a hobby area, but please take a leisurely look.

viridis plasma

inferno

magma

cividis

Greys

Purples

Blues

Greens

Oranges

Reds

YlOrBr

YlOrRd

OrRd

PuRd

RdPu

BuPu

GnBu

PuBu

YlGnBu

PuBuGn

BuGn

YlGn

binary

gist_yarg

gist_gray

gray

bone

pink

spring

summer

autumn

winter

cool

Wistia

hot

afmhot

gist_heat

copper

PiYG

PrGn

BrBG

PuOr

RdGy

RdBu

RdYlBu

RdYlGn

Spectral

coolwarm

bwr

seismic

twilight

twilight_shifted

hsv

Pastel1

Pastel2

Paired

Accent

Dark2

Set1

Set2

Set3

tab10

tab20

tab20b

tab20c

flag

prism

ocean

gist_earth

terrain

gist_stern

gnuplot

gnuplot2

CMRmap

cubehelix

brg

gist_rainbow

rainbow

jet

nipy_spectral

gist_ncar

You who scrolled so far seem to be a lover. This kind of playfulness is sometimes important for engineers.

Recommended Posts

Since the work case page is not updated, we will visualize the number of cases and encourage the update.
The update of conda is not finished.