[PYTHON] [Natural language processing] I want to meet an engineer who is changing jobs (or just before)

Quiet talk

Recently, we are recruiting engineers, so we often talk to you via Twitter and Qiita.

Many of them happened to be discovered on the timeline, so I want to talk to them more efficiently!

I thought, and when I entered Java change of job (← easy) in the Twitter search box, there was a lot of ** noise ** (job change service and related blogs), and I couldn't find the Tweet I wanted.

Suddenly, when I was scrolling the screen while feeling uncomfortable like that,

** "Why isn't Python doing this?" **

And it started to be solved by natural language processing.

Main subject

It seems interesting to do scraping & natural language processing from scratch, but this time I decided to use the API to achieve the purpose!

First of all, Twitter API

There are various plans for the Twitter API, but this time I used the Premium plan! (Plan in the middle of the photo)

<img src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/266096/eddc9e20-ca8f-9466-3c67-49e680b2ab01.png ", width=400, alt="Twitter API 料金表">

When you actually hit the search API with Java change job,

Source for hitting Twitter API [Python]
import json
import config
import requests
from requests_oauthlib import OAuth1Session

#OAuth authentication part (please fill in your own!)
CK      = ''
CS      = ''
AT      = ''
ATS     = ''
twitter = OAuth1Session(CK, CS, AT, ATS)

# Twitter Endpoint(Get search results)
url2 = 'https://api.twitter.com/1.1/tweets/search/30day/dev.json'

#Parameters to pass to Enedpoint
keyword = 'Java career change'

params ={
         'maxResults' : 100,      #Number of tweets to get
         'query'     : keyword  #Search keyword
         }

if req.status_code == 200:
    res = json.loads(req.text)
    for line in res['results']:

        if 'RT' in line['text']:
            continue
        
        print(line['text'])
        print('*******************************************')
else:
    print("Failed: %d" % req.status_code)

(↓ Result)

I'm writing a blog so please take a look if you like
#blog#Engineers#Inexperienced#Java #programming#pgfrat#Engineer career change#Inexperienced engineer
https://t.co/SYD2BRB0aX
*******************************************
【Java】Easy OAuth 2.0 Single Sign-on in Java / https://t.co/AZJqRFJFci #Plomari#News #promari #news#programming#IT… https://t.co/RJ4fCg56ds
*******************************************
JAVA、PHP、.Experienced web applications such as NET are welcome https://t.co/9qoD7fPcg9  #Jobs#Freelance#Sole proprietorship#SE #Free engineer#Job change#Employment#Full-time employee#Contract employee#Temporary staff ... https://t.co/fWvZCEukey
*******************************************
There are many freelance SEs and full-time SEs who are particular about full-time SEs, but when I asked the job change agent, it seems that there is not much difference...... ← If so, it seems that free people can earn more lol
#I want to change jobs#java #programming#Mutual follow
*******************************************
We are looking for anonymous questions from everyone!
 .
 .

: worried: There is a lot of noise like this ...

Next is COTOHA API

This time, I used COTOHA API for the natural language processing part!

You can use the following language processing with just a free account, which is a very welcome API in Japanese-speaking countries.

-** Parsing ** ← Used this time !!! --Named entity recognition --Proper noun (company name) correction --Resolution analysis --Keyword extraction --Similarity calculation --Sentence type judgment --User attribute estimation (β) --Stagnation removal (β) --Voice recognition error detection (β) --Sentiment analysis --Summary (β)

: point_down_tone2: You can use it like this. [Python]

#Authenticate and

BASE_URL = "xxx"
CLIENT_ID = "xxx"
CLIENT_SECRET = "xxx"
access_token = 'xxx'

#Authentication
def auth(client_id, client_secret):
    token_url = "https://api.ce-cotoha.com/v1/oauth/accesstokens"
    headers = {
        "Content-Type": "application/json",
        "charset": "UTF-8"
    }

    data = {
        "grantType": "client_credentials",
        "clientId": client_id,
        "clientSecret": client_secret
    }
    r = requests.post(token_url,
                      headers=headers,
                      data=json.dumps(data))
    return r.json()["access_token"]


auth(CLIENT_ID, CLIENT_SECRET)



#Parsing API

def cotoha_parse(sentence):
    
    base_url = BASE_URL
    headers = {
        "Content-Type": "application/json",
        "charset": "UTF-8",
        "Authorization": "Bearer {}".format(access_token)
    }

    data = {
        "sentence": sentence,
    }

    r = requests.post(base_url + "nlp/v1/parse",
                      headers=headers,
                      data=json.dumps(data))
    return r.json()

Combination logic

As a result of various trials, as a rule of thumb, I did the following

--Look at the tweets line by line and pick up the tweets that include change of job. --Parsing the sentence and make it positive when the label of the word following change of job is ** compound ** --In addition, if the verb suffix continues after change of job, it will be positive (I want to change jobs! Change jobs!)

  • Compound: Nouns and nouns and verb-to-verb compound words (Example 1: Subjective symptom symptom → ** compound ** Awareness) (Example 2: Job change destination, Job change request, ....)

(↓ Logic image and example sentences) image.png

The final script looks like this
#Originally Tweet is stored here.
precious_tweets = []


#results of twitter API
res = json.loads(req.text)
for tweet in res['results']:

    #Excludes RT items → Use only original tweets.
    if 'RT' in tweet['text']:
        continue

    for sentence in tweet['text'].split('。'):
        
        for line in sentence.split('\n'):
            if 'Job change' in line:
                
                #Parsing with COTOHA
                results_parsed = cotoha_parse(line)['result']
                
                for result in results_parsed:    
                    tokens = result['tokens']
                    
                    for index, token in enumerate(tokens):
                        if 'Job change' in token['form']:
                            try:
                                next_token = tokens[index + 1]
                                next_label = next_token['dependency_labels'][0]['label']
                            except:
                                print('try failed!  index:', index)
                                break 

                            if next_label == 'compound':
                                precious_tweets.append(tweet['text'])
                            elif next_token['pos'] == 'Verb suffix':
                                precious_tweets.append(tweet['text'])

(↓ The result is ...)

*******************************************
When it comes to changing jobs, I wonder if my current age is the best ~ ~ After all, when it comes to the latter half of 30, it seems that the number of hires will decrease dramatically ... Also hard....。
#I want to change jobs#java #programming#Mutual follow
*******************************************
3/4 Learning time 1h
3/5 Learning time 1h

Implementation of interview measures for job change consultants and web interviews
Unlike offline interviews, web interviews do not easily transfer heat.
The only way to convey your feelings to others is to move your emotions, and become a person who can move your emotions!
*******************************************
When you change jobs, the current trend/You can see what the required technology is.
Java isn't really called, right?
*******************************************
I'm looking for a new job.
Engineer, consulting hope
Lives in Hakata Ward.
Java(Two-and-a-half years)

: relaxed: I'm happy with the result

At the end

When I actually counted the output results, it was ** narrowed down to 7 out of 100 (7%) **, and although there was some noise, most of them were valuable tweets (for those who changed jobs). I was surprised with.

I gave up this time, but in reality it is possible to make it into a database using something like JSON Driver, and also use other language processing APIs to further enhance Twitter profile and location information, etc. think.

Also, if you include clustering etc., it seems that you can do a fairly rigorous analysis.

This time, I was able to understand the power of the program. Thanks to Twitter and NTT.

Also, since I am a person who has found a job thanks to Twitter & Qiita (as described in this article), I am sending it myself. I think it would be nice if there were more and more people like this. (We also want to actively talk to you.)

reference

Thank you very much for your reference !!!

Regarding Twitter API

-Official Developprt page -Play with twitter API # 3 (Get search results)

Regarding COTOHA API

-Official API documentation page

-[Program that automatically corrects "Takenoko no Sato" to "Kinoko no Yama" "correctly" (https://qiita.com/honehoney/items/a8c7ba2c53a70c88502e)

Recommended Posts

[Natural language processing] I want to meet an engineer who is changing jobs (or just before)
Loose articles for those who want to start natural language processing
I read an introductory book on natural language processing
Preparing to start natural language processing
I asked a friend who works in machine learning at a very famous IT company. Machine learning (natural language processing) What I want to learn for self-study
You become an engineer in 100 days ――Day 66 ――Programming ――About natural language processing
I want to analyze the emotions of people who want to meet and tremble
For those who want to perform natural language processing using WikiPedia's knowledge that goes beyond simple keyword matching
I want to make an automation program!
I tried natural language processing with transformers.
I tried to extract named entities with the natural language processing library GiNZA
5 Reasons Processing is Useful for Those Who Want to Get Started with Python