[PYTHON] Judge the authenticity of posted articles by machine learning (Google Prediction API).

Overview The content of the word-of-mouth site is unreliable. I thought it would be interesting if I could judge the authenticity of the posted content by machine learning. Use the Google Prediction API to determine the authenticity of your post.

Prerequisites ・ Registered on Google Cloud Platform (available for free at first) -Creating a Cloud Platform Console project (obtaining a project ID) -You have Prediction and Google Cloud Storage API enabled for your project. ・ Bucket creation -Upload training data to Cloud Storage. ・ Define model name -Get API authentication file (client_secrets.json) ↓ If you look at it once using the Google Prediction API, you can see the whole image. https://cloud.google.com/prediction/docs/quickstart?hl=ja

Sequence diagram

Screenshot 2017-05-20 13.05.20.png

prediction_service.py execution command:
python prediction_service.py "k_prediction/language.txt" "language-identifier" "Project<img width="841" alt="Screenshot 2017-05-20 13.05.20.png " src="https://qiita-image-store.s3.amazonaws.com/0/95254/cc3adb89-1330-77e9-bf2a-f32fa94f5b55.png ">
ID"

Run batch processing (prediction_service.py) ``` #File: prediction_service.py

#!/usr/bin/env python

-- coding: utf-8 --

Copyright 2014 Google Inc. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

"""Simple command-line sample for the Google Prediction API Command-line application that trains on your input data. This sample does the same thing as the Hello Prediction! example. You might want to run the setup.sh script to load the sample data to Google Storage. Usage: $ python prediction_service.py "bucket/object" "model_id" "project_id" "my-xxxxx.json" You can also get help on all the command-line flags the program understands by running: $ python prediction_service.py --help To get detailed log output run: $ python prediction_service.py --logging_level=DEBUG """ from future import print_function

author = ('[email protected] (Joe Gregorio), ' '[email protected] (Marc Cohen)')

import argparse import os from pprint import pprint as pp import sys import time import MySQLdb

sys.path.append( os.path.join(os.path.dirname(os.path.realpath(file)), 'lib') )

import httplib2 from apiclient import discovery from apiclient import sample_tools from oauth2client import client from oauth2client.service_account import ServiceAccountCredentials from googleapiclient import discovery from oauth2client import tools

Time to wait (in seconds) between successive checks of training status.

SLEEP_TIME = 10 scopes=['https://www.googleapis.com/auth/prediction','https://www.googleapis.com/auth/devstorage.read_only']


 → 1. Define authentication information

#Declare command-line flags. argparser = argparse.ArgumentParser(add_help=False) argparser.add_argument('object_name', help='Bucket name/hoge.txt(Training data)') argparser.add_argument('model_id', help='Model ID') argparser.add_argument('project_id', help='Project ID') argparser.add_argument('credential', help='client_secrets.json')

def print_header(line): '''Format and print header block sized to length of line''' header_str = '=' header_line = header_str * len(line) print('\n' + header_line) print(line) print(header_line)

def main(argv):

create flags

parents=[argparser] parent_parsers = [tools.argparser] parent_parsers.extend(parents) parser = argparse.ArgumentParser( description=doc, formatter_class=argparse.RawDescriptionHelpFormatter, parents=parent_parsers) flags = parser.parse_args(argv[1:]) credential_file = os.path.join(os.path.dirname(os.path.realpath(file)), flags.credential) credentials = ServiceAccountCredentials.from_json_keyfile_name( credential_file, scopes=scopes)

http = credentials.authorize(http = httplib2.Http()) service = discovery.build('prediction', 'v1.6', http=http)

 → 2. Access the Google Prediction API

try: # Get access to the Prediction API. papi = service.trainedmodels()

→ 3. List models.

# List models.
print_header('Fetching list of first ten models')
result = papi.list(maxResults=10, project=flags.project_id).execute()
print('List results:')
pp(result)
→ 4. Start training request on a data set.

# Start training request on a data set.
print_header('Submitting model training request')
body = {'id': flags.model_id, 'storageDataLocation': flags.object_name}
start = papi.insert(body=body, project=flags.project_id).execute()
print('Training results:')
pp(start)
→ 5. wait for the training to complete.

# Wait for the training to complete.
print_header('Waiting for training to complete')
while True:
  status = papi.get(id=flags.model_id, project=flags.project_id).execute()
  state = status['trainingStatus']
  print('Training state: ' + state)
  if state == 'DONE':
    break
  elif state == 'RUNNING':
    time.sleep(SLEEP_TIME)
    continue
  else:
    raise Exception('Training Error: ' + state)

  # Job has completed.
  print('Training completed:')
  pp(status)
  break
→ 6. Describe model

# Describe model.
print_header('Fetching model description')
result = papi.analyze(id=flags.model_id, project=flags.project_id).execute()
print('Analyze results:')
pp(result)
 → 7. Get the data to be predicted from the database

#DB call
print('================')
print('Get the data to predict from the database')
print('================')

if __name__ == "__main__":

  connector = MySQLdb.connect(host="??????", db="??????", user="??????", passwd="??????", charset="utf8")
  cursor = connector.cursor()

  sql = "SELECT id,message FROM `posts` WHERE id = (select MAX(id) from posts)"
  cursor.execute(sql)
  records = cursor.fetchall()
  for record in records:
      print (record[0])
      record_id=record[0]
      record_1=record[1].encode('utf-8')
  cursor.close()
  connector.close()
  #tst

# Make some predictions using the newly trained model.
print_header('Making some predictions')
for sample_text in [record_1]:
  body = {'input': {'csvInstance': [sample_text]}}
  result = papi.predict(
    body=body, id=flags.model_id, project=flags.project_id).execute()
  print('Prediction results for "%s"...' % sample_text)
  pp(result)

  import json
  array = json.dumps(result)
  data=json.loads(array)
  data2 = data['outputMulti']
  print(data2)
 → 8. Display the response data from the API

  print('================')
  print('Display response data from API')
  print('================')
  print(data2[0]['label'])
  print(data2[0]['score'])
  print(data2[1]['label'])
  print(data2[1]['score'])
  data_score=float(data2[0]['score'])-float(data2[1]['score'])
 → 9. Judge the response data from the API

  print('================')
  print('Judge response data from API')
  print('================') 
  if data_score > 0 :
    pacentage=float(data2[0]['score'])*100
        print( "This message is'"+str(pacentage)+"%''true'is.")
        evaluate = data2[0]['label']
    score = data2[0]['score']
  else: 
    pacentage=float(data2[1]['score'])*100
    print( "This message is'"+str(pacentage)+"%''false'is.") 
        evaluate = data2[1]['label']
        score = data2[1]['score']
        print(record_id)
   #DB call
 → 10. Reflect the result in the database

print('================')
print('Reflect the result in the database')
print('================')

if __name__ == "__main__":
  connector = MySQLdb.connect(host="?????", db="?????", user="?????", passwd="?????", charset="utf8")
  cursor = connector.cursor()
  cursor.execute('UPDATE posts SET evaluate  = (%s) WHERE id = (%s)',([evaluate],record_id))
  cursor.execute('UPDATE posts SET score  = (%s) WHERE id = (%s)',([score],record_id))
  connector.commit()
  connector.close()

if name == 'main': main(sys.argv)


 → 11. Reflect the judgment result saved in the DB in the newly posted article by the WEB application.

 Reference site
https://github.com/google/google-api-python-client/tree/master/samples/prediction

 <h2> Preparation </ h2>
 -Creating a Cloud Platform Console project (obtaining a project ID)
 -You have Prediction and Google Cloud Storage API enabled for your project.
 ・ Bucket creation
 -Upload training data to Cloud Storage.
 ・ Define model name
 -Get API authentication file (client_secrets.json)
https://cloud.google.com/prediction/docs/quickstart?hl=ja

 <h3> How to get client_secrets.json </ h3>
 I used c to do API authentication.
 I will briefly describe the procedure for creating client_secrets.json.

 procedure
 → Register the project on the Google Developer Console
 → Enable the Google Calendar API from "API and Authentication" → "API" on the left
 → From "API and Authentication" → "Authentication Information" on the left, press "Create a new client ID" → "Installed applications (others)" "Download JSON" and save this as client_secrets.json

File name: client_secrets.json

{ "web": { "client_id": "?????", "client_secret": "?????", "redirect_uris": [], "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://accounts.google.com/o/oauth2/token" } }

 <h3> Create training data </ h3>
 What is training data?
 This is the data that is the basis for judgment by machine learning.
 In explaining the training data, I will briefly explain opportunity learning.
 There are two main types of problems handled by machine learning: "supervised learning" and "unsupervised learning."
 The Google Prediction API is the former supervised learning.
 In supervised learning, the purpose is to correctly predict the output (invisible) for input data (visible) given. However, the machine does not know what to output even if the input is just input.
 Therefore, multiple cases of input / output pairs called training data (or teacher data) are given.
 In other words, ask humans to give some examples that if you can put this in, put it out.
 Based on this, the purpose is to create a machine (= function) that outputs the correct output when new input data arrives.
 Of course, if you get the exact same input you've seen so far, it seems enough to select the training data with the same input and output it, but it's medium. Some input data does not appear in the training data you have.
 For such data, it is a matter of supervised learning to design a learning procedure (learning algorithm) that generalizes the given training data and increases the ability to deal with data of unknown output as much as possible. This is the main theme.

 <h3> Training data </ h3>
 As explained above, this data is a set of input / output pair examples.
 It returns true when the character hoge is entered and false when the character hoge_false is entered.

train.txt

"true","hoge" "true","hoge" "true","hoge" "true","hoge" "true","hoge" "true","hoge" "true","hoge" "true","hoge" "true","hoge" "true","hoge" "false","hoge_false" "false","hoge_false" "false","hoge_false" "false","hoge_false" "false","hoge_false" "false","hoge_false" "false","hoge_false" "false","hoge_false"


 <h2> Building the environment </ h2>
 Describes the information necessary to build a development environment
 <h3> Python environment </ h3>

yum install mysql-server mysql mysql-devel

chkconfig mysqld on

chkconfig --list | grep mysql

service mysqld start

exit #<= ec2-Return to user

 <h3> Install necessary tools (such as gcc) before installing MySQL-python </ h3>

$ sudo yum groupinstall "Development Tools"​ If it doesn't work $sudo yum install gcc  MySQL-python install: ​ $ pip install MySQL-python


 <h3> Check with MySQL connection script </ h3>

file name : test_connection_mysql.py

import _mysql import sys from pprint import pprint as pp ​ try: con = _mysql.connect('localhost', 'root', 'root', 'prediction_dev') #Edit here help(con) except _mysql.Error, e: print "Error %d: %s" % (e.args[0], e.args[1]) sys.exit(1) finally: if con: con.close()

$ python test_connection_mysql.py <_mysql.connection open to 'localhost' at 265f940>

You can connect without any error.

 This completes the environment construction for machine learning with Python.


Recommended Posts

Judge the authenticity of posted articles by machine learning (Google Prediction API).
I tried calling the prediction API of the machine learning model from WordPress
[Python] Automatically totals the total number of articles posted by Qiita using the API
Play music by hitting the unofficial API of Google Play Music
Predict the presence or absence of infidelity by machine learning
An example of a mechanism that returns a prediction by HTTP from the result of machine learning
Summary of articles posted so far (statistics / machine learning / mathematics etc.)
Get the number of PVs of Qiita articles you posted with API
[Qiita API] [Statistics • Machine learning] I tried to summarize and analyze the articles posted so far.
Judgment of igneous rock by machine learning ②
One-click data prediction for the field realized by fully automatic machine learning
Python learning memo for machine learning by Chainer until the end of Chapter 2
Get a list of articles posted by users with Python 3 Qiita API v2
I tried to predict the presence or absence of snow by machine learning.
About the development contents of machine learning (Example)
Analysis of shared space usage by machine learning
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning
Reasonable price estimation of Mercari by machine learning
Classification of guitar images by machine learning Part 2
Get the image of "Suzu Hirose" by Google image search.
Impressions of taking the Udacity Machine Learning Engineer Nano-degree
Installation of TensorFlow, a machine learning library from Google
About testing in the implementation of machine learning models
Time series data prediction by AutoML (automatic machine learning)
Summary of the basic flow of machine learning with Python
Record of the first machine learning challenge with Keras
[Python] Get the number of views of all posted articles
Judge whether it is my child from the picture of Shiba Inu by deep learning (1)
A concrete method of predicting horse racing by machine learning and simulating the recovery rate
I tried to verify the yin and yang classification of Hololive members by machine learning
I made an API with Docker that returns the predicted value of the machine learning model
Try to evaluate the performance of machine learning / regression model
About the camera change event of Google Maps Android API
[Recommended tagging for machine learning # 1] Scraping of Hatena blog articles
The result of Java engineers learning machine learning in Python www
Survey on the use of machine learning in real services
Try to evaluate the performance of machine learning / classification model
How to increase the number of machine learning dataset images
I want to judge the authenticity of the elements of numpy array
[Machine learning] I tried to summarize the theory of Adaboost
The story of creating a database using the Google Analytics API
Basics of Machine Learning (Notes)
Importance of machine learning datasets
4 [/] Four Arithmetic by Machine Learning
A story stuck with the installation of the machine learning library JAX
[Machine learning] Check the performance of the classifier with handwritten character data
How to use machine learning for work? 01_ Understand the purpose of machine learning
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
How does the COTOHA API judge the feeling of "the moon is beautiful"?
Evaluate the accuracy of the learning model by cross-validation from scikit learn
Try to predict the triplet of boat race by ranking learning
Python beginners hit the unofficial API of Google Play Music to play music
Feature engineering for machine learning starting with the 1st Google Colaboratory --Binarization and discretization of count data
Try to predict the value of the water level gauge by machine learning using the open data of Data City Sabae