This is a summary of know-how when building a knowledge base + Web API with NoSQL and PaaS.
When implementing an architecture called string tag-oriented undirected graph knowledge base
,
An example built with Heroku + Redis + FastAPI,
I will introduce an example built with AWS (DynamoDB + Lambda + API Gateway).
All the code part uses Python3.8.0
.
There are various definitions of the knowledge base, but in this article
Refers to "a database that stores knowledge in a computer-readable format."
It is also called knowledge base
knowledge database`` KB
.
-Knowledge Base-Wikipedia -Knowledge Base (Neriji Base) --ITmedia Enterprise -Knowledge Base Software | Atlassian -What is the meaning of the knowledge base? Effects and how to make it in-house | Tayori Blog -What is a "knowledge database" that stores human knowledge as data? --Data Knowledge A truly usable domestic BI tool
This is the knowledge base we will build as an example. I think it's difficult to understand just by the name, so I prepared an image diagram. (Because visualization is not implemented, create it with the mind map tool coggle)
Just repeat the simple operation of "storing two strings" It is to form a dictionary-like body of knowledge (collective intelligence).
And you need a Web API to grow it at explosive speed.
This knowledge base only deals with string (and its set) data, Treat all strings as tags.
In the example above,
Web service name``
Account ID ʻURL`` Article title `` Concept
Programming language
Each character string such as is treated as one tag.
Due to the specifications, the character string does not include spaces or line feed characters.
In this knowledge base, we try to connect related tags.
For example, the tag framework
Rails
Laravel
Django
Flask
It is possible to obtain the data that the tag is linked,
For example, both the Qiita
and Python
tags are linked.
https://qiita.com/1ntegrale9/items/94ec4437f763aa623965
You can get the data like the tag (Qiita's article URL about Python).
In the figure above, the vertices (character strings) represent tags and the edges represent relationships. And since it is unsuitable, both can be referred to. In addition, there is no weighting because the inclusion relationship is not considered.
Reference article: Basics of Graph Theory --Qiita
This architecture is not in circulation Inspired by GraphQL, it was originally designed.
It's just a light survey with keywords per GraphDB, so Maybe it already exists.
If you want to operate it easily and for free, use this.
Here you will find the initial settings for Heroku and the basic operations for Redis. Introduction to NoSQL DB starting with Heroku x Redis x Python --Qiita
KVS is on-memory and fast to read and write. It also supports persistence. I want to associate multiple tags with one tag, so I use only the collective type.
Since it is handled by Python, use redis-py.
python3 -m pip install redis hiredis
hiredis-py is a fast parser wrapper for the C implementation. The redis-py side will detect hiredis and switch the parser, so put it in.
Initialize the connection with the following code.
Use the environment variable REDIS_URL
that is automatically set by Heroku Redis.
import redis, os
conn = redis.from_url(os.environ['REDIS_URL'], decode_responses=True)
If it is the default, there is a problem with the display of Japanese, so
decode_responses = True
is required.
Get it using keys ()
.
def get_all_tags():
return sorted(conn.keys())
It is convenient to see the tags in a list, so be prepared. However, please note that the load increases as the scale increases.
Get it using smembers (key)
.
def get_related_tags(tag):
r.smembers(tag) if r.exists(tag) else []
As a precaution, if a tag that does not exist is specified, an empty array will be returned. Use ʻexists (key) `to check for existence.
Use sadd (key, value)
to store aggregated data.
I want to link in both directions, so replace the key-value and execute it twice.
def set_relation_tags(tag1, tag2):
return conn.pipeline().sadd(t1, t2).sadd(t2, t1).execute()
Redis supports transactions, for redis-py
By chaining from pipeline ()
to ʻexecute () `
Batch execution within a transaction is possible.
Also, atomic execution by the pipeline method seems to be faster than individual execution. Efficient use of Redis in Python (to improve redis-py performance)-[Dd] enzow (ill)? With DB and Python / 08/212059)
FastAPI is one of Python's web frameworks You can implement a simple Web API with less code, The feature is that the API document is automatically generated without any settings.
Flask Responder Starlette DRF etc. are over-engineered, On the contrary, Bottle lacked the function, and the Fast API was just right.
python3 -m pip install fastapi uvicorn email-validator
Uvicorn is a fast ASGI server. Used to start FastAPI. Not a typo in Gunicorn.
If you do not include email-validator, you will get angry at startup. Why?
It's very simple.
main.py
from fastapi import FastAPI
app = FastAPI()
If you set the arguments title
and discription
,
The title and description will be reflected in the automatically generated API Doc like the image above.
main.py
app = FastAPI(
title='collective-intelligence',
description='String Tag Oriented Undirected Graph Knowledge Base',
)
You can also change the API Doc URL by specifying docs_url
.
The default is / docs
, but it's a good idea to keep it as root.
main.py
app = FastAPI(docs_url='/')
Simply write the HTTP method (GET), URL and return value. You can get a JSON response by returning a list or dictionary.
main.py
@app.get('/api')
def read_all_tags():
return get_all_tags()
This definition is automatically reflected in API Doc.
You can also execute the request from Try it out
in the upper right.
The tag assumes an arbitrary character string including symbols, The query string cannot handle it, so leave it as POST.
main.py
@app.post('/api/pull')
def read_related_tags(tag: str):
return get_related_tags(tag)
The tag: str
specified in the argument is accepted from the request body.
Type annotation is attached, and this is used to validate the request.
If it does not match, the response will be 422 Validation Error
.
FastAPI is called pydantic Contains a library for utilizing type annotations. Use this to define your own type and use it for validation.
main.py
from pydantic import BaseModel
class Tags(BaseModel):
tag1: str
tag2: str
@app.post('/api/push')
def create_tags_relationship(tags: Tags):
set_tags_relationship(tags.tag1, tags.tag2)
return {tag: get_related_tags(tag) for _, tag in tags}
The defined type is reflected in API Doc as Schema.
Start with the Uvicorn introduced earlier.
If you initialized with ʻapp in
main.py, specify
main: app. With the
--reload` option, it will be reloaded and reflected when the file is changed.
$ uvicorn main:app --reload
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [57749]
INFO: Started server process [57752]
INFO: Waiting for application startup.
INFO: Application startup complete.
If you access http://127.0.0.1:8000
or http://127.0.0.1:8000/docs
,
You can see that the API Doc is displayed.
It is a PaaS that allows you to easily deploy web applications. It supports many languages and frameworks, PostgreSQL and Redis also host up to a certain limit for free.
First, the following steps are required.
--Account registration Heroku | Sign up --Card registration Account · Billing | Heroku --Create New App | Heroku](https://dashboard.heroku.com/new-app) --Add-on Redis Heroku Redis --Add-ons --Heroku Elements
You will need the following files: Have this in your GitHub repository.
$ tree
.
├── main.py #application
├── Procfile #Process execution command definition file
├── requirements.txt #Dependent library definition file
└── runtime.txt #Python version definition file
Procfile
web: uvicorn main:app --host 0.0.0.0 --port $PORT
requirements.txt
fastapi
email-validator
uvicorn
redis
hiredis
runtime.txt
python-3.8.0
[Actual Directory](1ntegrale9 / collective-intelligence at heroku ) Also refer to.
Deploy from the Deploy tab of the Dashboard.
Link the repository in cooperation with GitHub and execute Manual Deploy
.
If you also set ʻAutomatic deploys`, it will be deployed automatically when you push to master.
Once the build completes successfully
Keep the registered process ON from Configure Dynos
.
You can see the deployed application from ʻOpen app` at the top right of the Dashboard.
If you are conscious of scalability, use this. It is also possible to flexibly change the data structure.
First API development using Lambda and DynamoDB --Qiita API Gateway + Lambda + DynamoDB - Qiita
As with RDB, 1 table and 1 primary key are basic. A primary key is a key that uniquely identifies data and is either a "partition key" or It means "composite key of partition key and sort key". You can relax the unique limitation of partition keys by adding sort keys.
How to start --Amazon DynamoDB | AWS Development of the first serverless application-Create a table in DynamoDB- | Developers.IO Understanding the capacity of DynamoDB to do your best in the free frame-Dual wield of IT and muscle training
Partition key: Tag Sort key: timestamp
First serverless application development-Getting DynamoDB value with Lambda- | Developers.IO Automatic deployment to AWS Lambda using GitHub Actions (detailed + demo procedure ver) --Qiita
The lambda_handler function is executed when Lambda is called
import boto3, time
from decimal import Decimal
def lambda_handler(event, context):
timestamp = Decimal(time.time())
table = boto3.resource('dynamodb').Table('collective-intelligence')
with table.batch_writer() as batch: #Batch when multiple puts_Use writer
batch.put_item(Item={
'tag': event['tag1'],
'related_tag': event['tag2'],
'timestamp': timestamp
})
batch.put_item(Item={
'tag': event['tag2'],
'related_tag': event['tag1'],
'timestamp': timestamp
})
return {'statusCode': 201}
import boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
table = boto3.resource('dynamodb').Table('collective-intelligence')
response = table.query(KeyConditionExpression=Key('tag').eq(event['tag'])) #Search by tag specification
tags = set(item['related_tag'] for item in response['Items']) #Store in set type and remove duplicates
return {'statusCode': 200, 'body': list(tags)} #Cast to list type for JSON response
Creates and manages Web API
Development of first serverless application-Calling Lambda from API Gateway- | Developers.IO API Gateway environment construction that you can learn while creating from scratch | Developers.IO Amazon API Gateway Tutorial-Amazon API Gateway
Create POST with / push and / pull
Playing before running Lambda may reduce costs
--Model (JSON Schema) definition --Settings-> Set "Verify body" in Request validation --Set model in request body
JSON Schema Tool Create Request and Response Mapping Models and Mapping Templates-Amazon API Gateway (https://docs.aws.amazon.com/ja_jp/apigateway/latest/developerguide/models-mappings.html) I tried the new function Request Validation of API Gateway --- MTI Engineer Blog
Check from the invoice on the Billing screen.
It's not in production yet, but As a result of sending and receiving hundreds of requests / responses in the test, it was 0 yen, so It seems that you should not be afraid to use it for trial purposes.
GCP vs AWS
I was worried about GCP (Firestore) and AWS (DynamoDB), but I adopted DynamoDB.
If you choose on the GCP side, you will have to worry about four data store services, If you use it as a hobby, I think you should choose Firestore. Select Database: Cloud Firestore or Realtime Database | Firebase
These are mostly self-taught, I think the skills to learn new skills were acquired in the modern environment of the company. It is a great experience for a strong engineer to be able to work in the field where new technology is used hard.
In addition, the configuration on the Heroku side is open to the public. The data is empty at the time of publication, but feel free to touch it. https://collective-intelligence.herokuapp.com/
Recommended Posts