I searched for the skills needed to become a web engineer in Python

Introduction

I scraped a job change site with Python and briefly investigated "necessary skills". Since I am aiming to be an engineer, I was wondering what kind of requirements are actually set up on the job change site.

Ideas and implementations <a href="https://qiita.com/kakiuchis/items/2c9b327cadf1e8dbdf6e" rel=”nofollow noopener” target="_blank"> Since I studied scraping, how to make an inexperienced person a web director I have taken the article of , which I considered in a data-driven manner, as a reference.

Please point out mistakes and small mistakes!

Required Skills Conclusion

For those who want to see only the results

--Common --Team development and communication skills --Git / GitHub

--Server side --Learning languages such as Ruby, PHP, Python, C (C #?), Java --Cloud knowledge such as AWS / GCP --Knowledge of server infrastructure in general --Web application frameworks such as Rails, Laravel, and Django --Database in general --General network --Front-end language --Docker container

I don't mind getting messy from the middle.

Is it the expected result?

Then to the contents.

Environment & build order

--Environment

--Reference

-<a href="https://qiita.com/sakaeda11/items/3832472b5eb923e0128f" rel=”nofollow noopener” target="_blank"> Python environment construction on Mac

-<a href="https://qiita.com/fiftystorm36/items/b2fd47cf32c7694adc2e" rel=”nofollow noopener” target="_blank"> venv: Python virtual environment management -<a href="https://qiita.com/kakiuchis/items/2c9b327cadf1e8dbdf6e" rel=”nofollow noopener” target="_blank"> Since I studied scraping, I learned how to become a web director by data-driven. I considered it.

Collecting information from job change sites

Scraping → The implementation of morphological analysis is generally as <a href="https://qiita.com/kakiuchis/items/2c9b327cadf1e8dbdf6e" rel=”nofollow noopener” target="_blank"> reference article , so it is omitted. The flow is as follows: Get the job URL for each job → Open the URL and get the required skill part → Calculate the frequency for each noun by morphological analysis. <a href="https://qiita.com/itkr/items/513318a9b5b92bd56185" rel=”nofollow noopener” target="_blank"> Scraping with Python and Beautiful Soup and <a href="https:/ /www.crummy.com/software/BeautifulSoup/bs4/doc/ "rel =” nofollow noopener ”target =" _ blank "> Official document was also referred to.

Especially scraping ・ Put sleep before opening the page ・ Check robots.txt and check disallow Let's be careful.

The number of job offers acquired is Front end: 208 cases Server side: 181 cases Games: 175 was

analysis

Click here for the code used for the analysis.

analysis_job_search.py


import MeCab
import pandas as pd
import numpy as np
import mysql.connector as mydb
import pandas.io.sql as psql
import collections

pd.set_option('display.max_rows', 500)

#Method to extract nouns from text
def devide_by_mecab(text):
    tagger = MeCab.Tagger("-Ochasen")
    node = tagger.parseToNode(text)
    word_list = []
    while node:
        pos = node.feature.split(",")[0]
        if pos in ["noun"]:
            word = node.surface
            word_list.append(word)
        node = node.next
    return "  ".join(word_list)

#Connect to MySQL Personal Settings Here
connection = mydb.connect(
  host = '',
  port = '',
  user = '',
  password = '',
  database = ''
)

#Get data from DB
df_frontend = psql.read_sql("SELECT * FROM table WHERE search_word = 'front end'",connection)
df_serverside = psql.read_sql("SELECT * FROM table WHERE search_word = 'Server side'",connection)
df_game = psql.read_sql("SELECT * FROM table WHERE search_word = 'Game programmer'",connection)

#need_A method that decomposes skills into nouns and returns them
def get_all_words(df):
    all_words = []
    for index, row in df.iterrows():
        words = devide_by_mecab(row['need_skills']).split()
        all_words.extend(words)
    return all_words

count_of_words_frontend = collections.Counter(get_all_words(df_frontend))
count_of_words_serverside = collections.Counter(get_all_words(df_serverside))
count_of_words_game = collections.Counter(get_all_words(df_game))

#See all nouns in order of frequency
count_of_words_frontend.most_common()
count_of_words_serverside.most_common()
count_of_words_game.most_common()

#Arrangement of words related to skill(count >= 6)
frontend_top_words = ['JavaScript', 'CSS', 'HTML', 'js', 'Vue', 'React', 'design', 'team', 'Git', 'UI', 'PHP', 'UX', 'Angular', 'Ruby', 'Javascript', 'jQuery', 'Photoshop', 'communication', 'API', 'TypeScript', 'SPA', 'Java', 'Sass', 'designer', 'Illustrator', 'JS', 'test', 'server', 'webpack', 'GitHub', 'AWS', 'AngularJS', 'WordPress', 'Webpack', 'Rails', 'iOS', 'CMS', 'Python', 'Redux', 'MySQL', 'Gulp', 'Android', 'gulp', 'C', 'SCSS', 'git', 'DB', 'Linux', 'Babel', 'Docker', 'CI']
serverside_top_words = ['Ruby', 'PHP', 'AWS', 'Python', 'C', 'Java', 'server', 'Rails', 'team', 'infrastructure', 'js', 'Git', 'server', 'Android', 'JavaScript', 'Go', 'Linux', 'Perl', 'HTML', 'MySQL', 'RDBMS', 'CSS', 'Kura', 'Udo', 'API', 'front', 'management', 'GitHub', 'iOS', 'DB', 'GCP', 'React', 'Vue', 'network', 'Node', 'HTTP', 'Swift', 'CI', 'Objective', 'Docker', 'Security', 'Javascript', 'Azure', 'native', 'PostgreSQL', 'architecture', 'SQL', 'test', '#', 'smart', 'phone', 'UI', 'MVC', 'communication', 'git', 'Scala', 'Kotlin' , 'CD', 'Database', 'TypeScript', 'Apache', 'LAMP', 'designer', 'container', 'RDB', 'Laravel']
game_top_words = ['C', '3', 'D', 'Unity', 'Java', 'PHP', 'design', '++', 'server', '++、', 'network', 'JavaScript', 'Android', 'management', '#、', 'Objective', 'Photoshop', 'Maya', 'team', 'designer', 'Linux', 'MySQL', 'Ruby', 'Python', 'infrastructure', '#', 'Graphics', 'server', 'Excel', 'graphic', 'communication', 'Unreal', 'DCG', 'AWS', 'Perl', 'Illustrator', 'Engine', 'planner', 'Word', 'native', 'motion', 'director', 'HTML', 'UI', 'Flash', 'effect', 'VB', 'sound', 'DS', 'OpenGL', 'iOS', 'DirectX']

#Method to create DataFrame of word and number of occurrences
def get_top_word_df(top_words,count_of_words):
    df = pd.DataFrame({})
    for i,word in enumerate(top_words):
        word_data = pd.Series([word,count_of_words[word]], index=['word','count'], name=i)
        df = df.append(word_data)
    return df

df_frontend_top_words =  get_top_word_df(frontend_top_words,count_of_words_frontend)
df_serverside_top_words =  get_top_word_df(serverside_top_words,count_of_words_serverside)
df_game_top_words =  get_top_word_df(game_top_words,count_of_words_game)

for df in [df_frontend_top_words,df_serverside_top_words,df_game_top_words]:
    df['rank'] = df['count'].rank(ascending = False, method = 'min').astype(int)
    df['count'] = df['count'].astype(int)

df_frontend_top_words[['rank','word','count']]
df_serverside_top_words[['rank','word','count']]
df_game_top_words[['rank','word','count']]

First, all the words are output in order of frequency, and the words that do not seem to be related to the skill are manually removed and output again. The game programmer was personally interested, so I added it.

The result is as follows.

front end

rank word count
1 JavaScript 147
2 CSS 145
3 HTML 131
4 js 72
5 Vue 63
5 React 63
7 design 60
8 team 40
9 Git 34
9 UI 34
11 PHP 31
12 UX 30
13 Angular 29
14 Ruby 23
15 Javascript 21
16 jQuery 20
16 Photoshop 20
18 communication 18
18 API 18
18 TypeScript 18
18 SPA 18
22 Java 16
22 Sass 16
24 designer 15
24 Illustrator 15
24 JS 15
24 test 15
28 server 14
29 webpack 13
29 GitHub 13
29 AWS 13
29 AngularJS 13
33 WordPress 12
33 Webpack 12
33 Rails 12
36 iOS 11
36 CMS 11
36 Python 11
36 Redux 11
40 MySQL 10
40 Gulp 10
42 Android 9
42 gulp 9
42 C 9
45 SCSS 8
45 git 8
47 DB 7
47 Linux 7
49 Babel 6
49 Docker 6
49 CI 6

Server side

rank word count
1 Ruby 81
2 PHP 67
3 AWS 50
4 Python 43
5 C 42
6 Java 41
7 server 37
8 Rails 34
9 team 33
10 infrastructure 31
11 js 29
12 Git 27
12 server 27
14 Android 26
14 JavaScript 26
14 Go 26
17 Linux 24
17 Perl 24
19 HTML 21
19 MySQL 21
19 RDBMS 21
22 CSS 19
23 Kura 18
23 Udo 18
23 API 18
26 front 17
26 management 17
26 GitHub 17
29 iOS 16
30 DB 15
30 GCP 15
30 React 15
33 Vue 14
34 network 12
34 Node 12
36 HTTP 11
36 Swift 11
36 CI 11
36 Objective 11
40 Docker 10
40 Security 10
40 Javascript 10
40 Azure 10
44 native 9
44 PostgreSQL 9
44 architecture 9
44 SQL 9
44 test 9
49 # 8
49 smart 8
49 phone 8
49 UI 8
49 MVC 8
49 communication 8
49 git 8
49 Scala 8
57 Kotlin 7
57 CD 7
57 Database 7
57 TypeScript 7
57 Apache 7
57 LAMP 7
63 designer 6
63 container 6
63 RDB 6
63 Laravel 6

Game programmer

rank word count
1 C 156
2 3 62
3 D 49
4 Unity 45
5 Java 32
6 PHP 31
7 design 29
8 ++ 26
9 server 22
10 ++、 19
10 network 19
12 JavaScript 17
13 Android 15
14 management 14
15 #、 13
16 Objective 12
16 Photoshop 12
16 Maya 12
19 team 11
19 designer 11
19 Linux 11
19 MySQL 11
19 Ruby 11
19 Python 11
25 infrastructure 10
25 # 10
25 Graphics 10
28 server 9
28 Excel 9
30 graphic 8
30 communication 8
30 Unreal 8
30 DCG 8
30 AWS 8
30 Perl 8
36 Illustrator 7
36 Engine 7
36 planner 7
36 Word 7
36 native 7
36 motion 7
42 director 6
42 HTML 6
42 UI 6
42 Flash 6
42 effect 6
42 VB 6
42 sound 6
42 DS 6
42 OpenGL 6
42 iOS 6
42 DirectX 6

I feel that the results are almost as expected.

Originally, the same words such as'javascript',' Javascript', and'js' should be named properly, but it seemed to be difficult because there were many words, so I was frustrated.

Conclusion

Front-end engineer: It seems that html, css, javascript are outstanding and indispensable. In addition, the second step is to be able to use frameworks such as Vue and React while strengthening the design, UI / UX, and it seems good to deepen the understanding on the server side after that.

Server-side engineer: The cloud is divided into'Kura'and'Udo'! …… Quiet talk. It seems that Ruby, PHP, Python, and Java are the main languages (I don't know which one is C). In addition, cloud is also a compulsory subject. I also want the understanding of the database, network, and front end side. There seems to be a lot of studying.

Game programmer: After all the direction is a little different and it is interesting. C ++, C # and Unity, 3D are the main fields, and it seems that it is necessary to learn around graphics while programming.

I have little knowledge, so I can only say something really rough ... orz. I would like to study based on this result!

Task

――Since the word is a little strict, the number of acquisitions is small (around 200)

-Since it is a survey of only one site, there is a bias

--Name identification such as lowercase letters, abbreviations, typographical errors, etc.

the end

I just looked at the frequency briefly, but I'm glad that it also served as a guide for the fields I should study.

It might be more interesting to compare on multiple sites or try on a cross-sectional site. If you are interested, please check it out with your own eyes!

Recommended Posts

I searched for the skills needed to become a web engineer in Python
[Introduction to Python] How to use the in operator in a for statement?
I wrote a script to extract a web page link in Python
I searched for prime numbers in python
I made a scaffolding tool for the Python web framework Bottle
I made a web application in Python that converts Markdown to HTML
I made a program to check the size of a file in Python
An introduction to self-made Python web applications for a sluggish third-year web engineer
I want to create a window in Python
Steps to develop a web application in Python
[Python] I searched for the longest Pokemon Shiritori
I want to display the progress in Python!
I tried to graph the packages installed in Python
I want to embed a variable in a Python string
I want to write in Python! (2) Let's write a test
I tried to implement a pseudo pachislot in Python
I want to randomly sample a file in Python
I want to write in Python! (3) Utilize the mock
I want to use the R dataset in python
The concept of reference in Python collapsed for a moment, so I experimented a little.
I just want to find the 95% confidence interval for the difference in population ratios in Python
[For beginners] Web scraping with Python "Access the URL in the page to get the contents"
Let's make a leap in the manufacturing industry by utilizing the Web in addition to Python
How to use the __call__ method in a Python class
Change the standard output destination to a file in Python
A simple way to avoid multiple for loops in Python
I tried "How to get a method decorated in Python"
I didn't know how to use the [python] for statement
I tried to implement the mail sending function in Python
I tried to make a stopwatch using tkinter in python
I want to make input () a nice complement in python
I want to create a Dockerfile for the time being.
I made a script in Python to convert a text file for JSON (for vscode user snippet)
I made a class to get the analysis result by MeCab in ndarray with python
I created an environment for Masonite, a Python web framework similar to Laravel, with Docker!
I tried to create a Python script to get the value of a cell in Microsoft Excel
I also tried to imitate the function monad and State monad with a generator in Python
"A book to train programming skills to fight in the world" Python code answer example --1.8 "0" matrix
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
In the python command python points to python3.8
Hit the web API in Python
I wrote the queue in Python
I wrote the stack in Python
How to determine the existence of a selenium element in Python
I created a Python library to call the LINE WORKS API
How to check the memory size of a variable in Python
I wrote the code to write the code of Brainf * ck in python
How to check the memory size of a dictionary in Python
I wrote a function to load a Git extension script in Python
I tried to implement a misunderstood prisoner's dilemma game in Python
[For beginners] How to register a library created in Python in PyPI
I wanted to solve the ABC164 A ~ D problem with Python
I made a command to display a colorful calendar in the terminal
"A book to train programming skills to fight in the world" Python code Solution example --1.6 String compression
Run the output code on the local web server as "A, pretending to be B" in python
"A book to train programming skills to fight in the world" Python code solution example --1.5 One-shot conversion
"A book to train programming skills to fight in the world" Python code Solution example --1.7 Matrix rotation
Put the process to sleep for a certain period of time (seconds) or more in Python
"A book to train programming skills to fight in the world" Python code Solution example --2.8 Loop detection
I tried to find out the difference between A + = B and A = A + B in Python, so make a note
I made a function to check if the webhook is received in Lambda for the time being