[PYTHON] Knowledge base + Web API operated by NoSQL x PaaS

What is this

This is a summary of know-how when building a knowledge base + Web API with NoSQL and PaaS.

When implementing an architecture called string tag-oriented undirected graph knowledge base, An example built with Heroku + Redis + FastAPI, I will introduce an example built with AWS (DynamoDB + Lambda + API Gateway).

All the code part uses Python3.8.0.

What is a knowledge base?

There are various definitions of the knowledge base, but in this article Refers to "a database that stores knowledge in a computer-readable format." It is also called knowledge base knowledge database`` KB.

Reference link (click to open)

-Knowledge Base-Wikipedia -Knowledge Base (Neriji Base) --ITmedia Enterprise -Knowledge Base Software | Atlassian -What is the meaning of the knowledge base? Effects and how to make it in-house | Tayori Blog -What is a "knowledge database" that stores human knowledge as data? --Data Knowledge A truly usable domestic BI tool

String Tag Oriented Undirected Graph Knowledge Base

This is the knowledge base we will build as an example. I think it's difficult to understand just by the name, so I prepared an image diagram. (Because visualization is not implemented, create it with the mind map tool coggle)

スクリーンショット 2019-12-03 20.00.32.png

The role of this knowledge base

Just repeat the simple operation of "storing two strings" It is to form a dictionary-like body of knowledge (collective intelligence).

And you need a Web API to grow it at explosive speed.

About "string tag oriented"

This knowledge base only deals with string (and its set) data, Treat all strings as tags.

In the example above, Web service name`` Account ID ʻURL`` Article title `` Concept Programming language Each character string such as is treated as one tag.

Due to the specifications, the character string does not include spaces or line feed characters.

About "undirected graph type"

In this knowledge base, we try to connect related tags.

For example, the tag framework Rails Laravel Django Flask It is possible to obtain the data that the tag is linked,

スクリーンショット 2019-12-03 20.43.44.png

For example, both the Qiita and Python tags are linked. https://qiita.com/1ntegrale9/items/94ec4437f763aa623965 You can get the data like the tag (Qiita's article URL about Python).

スクリーンショット 2019-12-03 20.44.33.png

In the figure above, the vertices (character strings) represent tags and the edges represent relationships. And since it is unsuitable, both can be referred to. In addition, there is no weighting because the inclusion relationship is not considered.

Reference article: Basics of Graph Theory --Qiita

Supplementary information

This architecture is not in circulation Inspired by GraphQL, it was originally designed.

It's just a light survey with keywords per GraphDB, so Maybe it already exists.

Construction example: Redis + FastAPI + Heroku

If you want to operate it easily and for free, use this.

Here you will find the initial settings for Heroku and the basic operations for Redis. Introduction to NoSQL DB starting with Heroku x Redis x Python --Qiita

Redis

KVS is on-memory and fast to read and write. It also supports persistence. I want to associate multiple tags with one tag, so I use only the collective type.

Library installation

Since it is handled by Python, use redis-py.

python3 -m pip install redis hiredis

hiredis-py is a fast parser wrapper for the C implementation. The redis-py side will detect hiredis and switch the parser, so put it in.

Connect to Redis

Initialize the connection with the following code. Use the environment variable REDIS_URL that is automatically set by Heroku Redis.

import redis, os
conn = redis.from_url(os.environ['REDIS_URL'], decode_responses=True)

If it is the default, there is a problem with the display of Japanese, so decode_responses = True is required.

Get all tags

Get it using keys ().

def get_all_tags():
    return sorted(conn.keys())

It is convenient to see the tags in a list, so be prepared. However, please note that the load increases as the scale increases.

Get the tag to be tied

Get it using smembers (key).

def get_related_tags(tag):
    r.smembers(tag) if r.exists(tag) else []

As a precaution, if a tag that does not exist is specified, an empty array will be returned. Use ʻexists (key) `to check for existence.

Store two tags in association with each other

Use sadd (key, value) to store aggregated data. I want to link in both directions, so replace the key-value and execute it twice.

def set_relation_tags(tag1, tag2):
    return conn.pipeline().sadd(t1, t2).sadd(t2, t1).execute()

Redis supports transactions, for redis-py By chaining from pipeline () to ʻexecute () ` Batch execution within a transaction is possible.

Also, atomic execution by the pipeline method seems to be faster than individual execution. Efficient use of Redis in Python (to improve redis-py performance)-[Dd] enzow (ill)? With DB and Python / 08/212059)

FastAPI

FastAPI is one of Python's web frameworks You can implement a simple Web API with less code, The feature is that the API document is automatically generated without any settings.

Flask Responder Starlette DRF etc. are over-engineered, On the contrary, Bottle lacked the function, and the Fast API was just right.

Library installation

python3 -m pip install fastapi uvicorn email-validator

Uvicorn is a fast ASGI server. Used to start FastAPI. Not a typo in Gunicorn.

If you do not include email-validator, you will get angry at startup. Why?

Application initialization

It's very simple.

main.py


from fastapi import FastAPI
app = FastAPI()

If you set the arguments title and discription, The title and description will be reflected in the automatically generated API Doc like the image above.

main.py


app = FastAPI(
    title='collective-intelligence',
    description='String Tag Oriented Undirected Graph Knowledge Base',
)

You can also change the API Doc URL by specifying docs_url. The default is / docs, but it's a good idea to keep it as root.

main.py


app = FastAPI(docs_url='/')

Get all tags

Simply write the HTTP method (GET), URL and return value. You can get a JSON response by returning a list or dictionary.

main.py


@app.get('/api')
def read_all_tags():
    return get_all_tags()

This definition is automatically reflected in API Doc. You can also execute the request from Try it out in the upper right.

Get the tag associated with the specified tag

The tag assumes an arbitrary character string including symbols, The query string cannot handle it, so leave it as POST.

main.py


@app.post('/api/pull')
def read_related_tags(tag: str):
    return get_related_tags(tag)

The tag: str specified in the argument is accepted from the request body. Type annotation is attached, and this is used to validate the request. If it does not match, the response will be 422 Validation Error.

Store two tags in association with each other

FastAPI is called pydantic Contains a library for utilizing type annotations. Use this to define your own type and use it for validation.

main.py


from pydantic import BaseModel

class Tags(BaseModel):
    tag1: str
    tag2: str

@app.post('/api/push')
def create_tags_relationship(tags: Tags):
    set_tags_relationship(tags.tag1, tags.tag2)
    return {tag: get_related_tags(tag) for _, tag in tags}

The defined type is reflected in API Doc as Schema.

Launch FastAPI

Start with the Uvicorn introduced earlier. If you initialized with ʻapp in main.py, specify main: app. With the --reload` option, it will be reloaded and reflected when the file is changed.

$ uvicorn main:app --reload
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [57749]
INFO:     Started server process [57752]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

If you access http://127.0.0.1:8000 or http://127.0.0.1:8000/docs, You can see that the API Doc is displayed.

Heroku

It is a PaaS that allows you to easily deploy web applications. It supports many languages and frameworks, PostgreSQL and Redis also host up to a certain limit for free.

First, the following steps are required.

--Account registration Heroku | Sign up --Card registration Account · Billing | Heroku --Create New App | Heroku](https://dashboard.heroku.com/new-app) --Add-on Redis Heroku Redis --Add-ons --Heroku Elements

Preparation of necessary files

You will need the following files: Have this in your GitHub repository.

$ tree
.
├── main.py          #application
├── Procfile         #Process execution command definition file
├── requirements.txt #Dependent library definition file
└── runtime.txt      #Python version definition file

Procfile


web: uvicorn main:app --host 0.0.0.0 --port $PORT

requirements.txt


fastapi
email-validator
uvicorn
redis
hiredis

runtime.txt


python-3.8.0

[Actual Directory](1ntegrale9 / collective-intelligence at heroku ) Also refer to.

Deploy application

Deploy from the Deploy tab of the Dashboard. Link the repository in cooperation with GitHub and execute Manual Deploy. If you also set ʻAutomatic deploys`, it will be deployed automatically when you push to master.

スクリーンショット 2019-12-19 14.45.06.png

Once the build completes successfully Keep the registered process ON from Configure Dynos.

スクリーンショット 2019-12-19 14.48.21.png

You can see the deployed application from ʻOpen app` at the top right of the Dashboard.

Construction example: AWS (DynamoDB + Lambda + API Gateway)

Please wait for publication as it is being written

If you are conscious of scalability, use this. It is also possible to flexibly change the data structure.

First API development using Lambda and DynamoDB --Qiita API Gateway + Lambda + DynamoDB - Qiita

Amazon DynamoDB

As with RDB, 1 table and 1 primary key are basic. A primary key is a key that uniquely identifies data and is either a "partition key" or It means "composite key of partition key and sort key". You can relax the unique limitation of partition keys by adding sort keys.

How to start --Amazon DynamoDB | AWS Development of the first serverless application-Create a table in DynamoDB- | Developers.IO Understanding the capacity of DynamoDB to do your best in the free frame-Dual wield of IT and muscle training

Table design

Partition key: Tag Sort key: timestamp

Creating a table

AWS Lambda

First serverless application development-Getting DynamoDB value with Lambda- | Developers.IO Automatic deployment to AWS Lambda using GitHub Actions (detailed + demo procedure ver) --Qiita

Store two tags in association with each other

The lambda_handler function is executed when Lambda is called

import boto3, time
from decimal import Decimal

def lambda_handler(event, context):
    timestamp = Decimal(time.time())
    table = boto3.resource('dynamodb').Table('collective-intelligence')
    with table.batch_writer() as batch: #Batch when multiple puts_Use writer
        batch.put_item(Item={
            'tag': event['tag1'],
            'related_tag': event['tag2'],
            'timestamp': timestamp
        })
        batch.put_item(Item={
            'tag': event['tag2'],
            'related_tag': event['tag1'],
            'timestamp': timestamp
        })
    return {'statusCode': 201}

Get the tag associated with the specified tag

import boto3
from boto3.dynamodb.conditions import Key

def lambda_handler(event, context):
    table = boto3.resource('dynamodb').Table('collective-intelligence')
    response = table.query(KeyConditionExpression=Key('tag').eq(event['tag'])) #Search by tag specification
    tags = set(item['related_tag'] for item in response['Items']) #Store in set type and remove duplicates
    return {'statusCode': 200, 'body': list(tags)} #Cast to list type for JSON response

Amazon API Gateway

Creates and manages Web API

Development of first serverless application-Calling Lambda from API Gateway- | Developers.IO API Gateway environment construction that you can learn while creating from scratch | Developers.IO Amazon API Gateway Tutorial-Amazon API Gateway

Creating resources and methods

Create POST with / push and / pull

Set request validation

Playing before running Lambda may reduce costs

--Model (JSON Schema) definition --Settings-> Set "Verify body" in Request validation --Set model in request body

JSON Schema Tool Create Request and Response Mapping Models and Mapping Templates-Amazon API Gateway (https://docs.aws.amazon.com/ja_jp/apigateway/latest/developerguide/models-mappings.html) I tried the new function Request Validation of API Gateway --- MTI Engineer Blog

Creating a method

create-method.png

Method selection

create-post.png

Method management screen

do-pull.png

Creating a PULL model

model-pull.png

Creating a PUSH model

model-push.png

Set request validation

request-pull.png

PULL API testing

test-pull.png

PUSH API testing

test-push.png

Regarding usage fees

Check from the invoice on the Billing screen.

It's not in production yet, but As a result of sending and receiving hundreds of requests / responses in the test, it was 0 yen, so It seems that you should not be afraid to use it for trial purposes.

スクリーンショット 2019-12-05 19.39.49.png

GCP vs AWS

I was worried about GCP (Firestore) and AWS (DynamoDB), but I adopted DynamoDB.

If you choose on the GCP side, you will have to worry about four data store services, If you use it as a hobby, I think you should choose Firestore. Select Database: Cloud Firestore or Realtime Database | Firebase

At the end

These are mostly self-taught, I think the skills to learn new skills were acquired in the modern environment of the company. It is a great experience for a strong engineer to be able to work in the field where new technology is used hard.

In addition, the configuration on the Heroku side is open to the public. The data is empty at the time of publication, but feel free to touch it. https://collective-intelligence.herokuapp.com/

Recommended Posts

Knowledge base + Web API operated by NoSQL x PaaS