The story of migrating from home server (MariaDB + Java) to AWS (DynamoDB + Python + PHP) with reduced monthly cost

Nice to meet you, my name is Yuya Takeda (@kagamikarasu). We are developing "SOLD OUT 2 Market Information Site".

Redevelopment of "market information site" started around August 2020 and replaced in September. After more than a month, I've calmed down, so I'd like to write an article about the "market information site".

What is SOLD OUT 2?

It is an "online shop pretend" operated by mu. https://so2.mutoys.com/

tutorial https://so2-docs.mutoys.com/common/tutorial.html

Consume items to get new items, and use them to create and sell new products. Each person has their own play style, whether it's selling to NPCs, selling to users, or attending events.

What is SOLD OUT 2 Market Information Site?

This is a market information site developed by me (@kagamikarasu). https://market.kagamikarasu.net/

Since the API is open to the public by mu, I developed it using it. You can grasp the price / inventory transition for each product and the number of inventory reductions for each store in graph / table format.

All the end user can do is reference.

Why redeveloped?

I was running on my home server + Conoha (LB role) + Java 1.8 + MySQL (later MariaDB). It became difficult to develop, so I redeveloped it.

Java is messy, getting and deploying certificates is a hassle, I wanted to quit my home server (no PC / UPS) due to the bloated database ...

Fight against database bloat

It was over a billion records on a large table. At first, I didn't think about anything, so I didn't think about the amount of data.

I felt that it was obviously slow, so I reviewed the index and search conditions, but I was in a state of water on the burnt stone, so I solved it by changing the storage and earning IOPS.

HDD → SSD → SSD (NVME) in 3 years. NVME is very fast, until then if you say sweet Did you make a mistake in typing the command because the migration was too fast when changing the storage? I doubt it.

Transition goal

I was just studying SAA, so I decided to use AWS. The biggest problem is keeping the fourth monthly cost down.

If it is RDS + EC2 + ALB without thinking about anything, depending on the configuration, I think that RDS will be a reasonable amount of money. It's a normal amount for a corporation, but it's very painful for an individual (at least me) ('A'

Service used

* Please follow the dosage for the items listed below. </ font>

  • DynamoDB (on demand)
  • SQS (standard queue)
  • Lambda(Python+Pandas) + CloudWatchEvent
  • ECS(EC2spot+ECR+ALB)
  • S3 (HIVE + json format)
  • Route53
  • API Gateway

Configuration diagram after migration

スクリーンショット 2020-10-28 15.15.07.png

DynamoDB It is easiest to migrate to RDS (MariaDB), but if you migrate and operate hundreds of GB of data, the operating cost will be reduced. I wrote that it will migrate to several hundred GB, but all the migration data will be transferred to S3 in HIVE format + JSON format. If necessary, perform S3 → Lambda → SQS → DyanamoDB, which will be described later.

The free tier of DynamoDB is 25GB free tier at the time of writing the article, and 25 on-demand WCU / RCU.

  • 1WCU can write up to 1KB of data in 1 second
  • 1 RCU can read up to 4KB of data in 1 second

Therefore, it is desirable to keep the data per record within 1KB. This time, only one table is used and 5WCU / RCU is assigned. Since the market information site is extracted for each item, the partition key will naturally be item_id. Since I want to have it in chronological order, the sort key is the UNIX time of the registration date and time.

In the database before migration (MariaDB), we had data in store units (before aggregation) at 10-minute intervals, The migrated database (DyanmoDB) has data in item units (aggregated) at 3-hour intervals.

Since SQS is also used, the specific aggregation / storage method will be described later, but the WCU consumption will be as shown in the figure below. You can continue to use the database for free as long as the capacity is within 25GB.

スクリーンショット 2020-10-27 15.02.58.png

SQS Next is SQS. I had never been aware of queuing, but while studying SAA, I found out that it was a very good one and decided to use it in combination with DynamoDB.

The free tier for SQS is 1 million requests. I didn't understand it at first, but I will use 2 requests for sending and receiving. Currently, we are estimating with 2000 items, so we are setting it in 3 hour units to suppress it with 1 million requests. 2000 * 8 * 30 * 2 = 960,000 Since it's SQS, it's okay if you come out a little, but I want to keep it as free as possible, so it's every 3 hours.

As mentioned above, the aggregated data is stored in DynamoDB every 3 hours. It is stored in item units, but currently it was necessary to store about 2000 items every 3 hours. If you store 2000 items in 1 second, you need 2000WCU as 1KB per item. Since it will go bankrupt for more than $ 1000 per month, this time I will combine SQS and DynamoDB to shift the write timing.

Specifically, it is Lambda (aggregation) → SQS → Lambda (SQS extraction / DynamoDB storage) → DynamoDB. It doesn't have to be a FIFO, so it uses a standard queue.

At first I thought that the Lambda trigger of SQS was convenient, but I was experiencing consumption that I did not expect to be monitoring, and when I investigated it, the following occurred. Reference: https://encr.jp/blog/posts/20200326_morning/

It's a small thing, but I felt a little uncomfortable, so I decided to fetch it from SQS by myself with CloudWatch Event and Lambda instead of Lambda trigger.

The number of SQS received is as follows. In CloudWatchEvent, the queue is viewed in 1-minute units, and Lambda sets a queue acquisition limit to prevent capacity overload as much as possible.

スクリーンショット 2020-10-27 16.15.14.png

Lambda CloudWatch Event + Lambda is responsible for all acquisition and aggregation.

Lambda is the number of requests and execution time. The free tier is 1 million requests and 400,000 GB seconds. At least, my usage didn't exceed the free tier. (A little over 10% of the free frame)

At the time of writing, I am making the following functions.

  • Master data acquisition (S3 save)
  • Sales / order data acquisition (S3 save)
  • Sales / order data aggregation (S3 acquisition / storage / SQS)
  • Population data acquisition (S3 save)
  • Population data aggregation (S3 acquisition / storage / SQS)
  • Response function for API Gateway
  • Other content creation functions

Since I am using Python + Pandas, it is very easy to do the aggregation. At first I thought that I could push it with for, but it became very painful in memory, so I used Pandas.

Since it uses Serverless Framework, it is easier to deploy. I think that what I am doing is the same as CloudFormation, so I also describe API Gateway.

  • Admin authority is required, but I think this is unavoidable due to the mechanism ...

If you repeat the test and deployment, the waste will increase probably because of version control. I think you should use "serverless-prune-plugin" to automatically delete it.

The usage amount for Lambda is as follows. The amount of memory allocated varies depending on the function, but it is set to 128MB to 512MB.

スクリーンショット 2020-10-27 16.49.43.png

CloudWatchEvent Used for batch processing (regular processing). I think there are various things such as EC2 + cron and digdag, but since Lambda can be used, I will combine it with CloudWatch Event.

When I thought that the batch wasn't working at one point, it was GMT ... Since the language is set to Japanese, I was wary of JST ...

ECS(EC2) Next, regarding ECS, this is very powerful. It is a combination of ECR + ECS + ALB + Route53.

  1. ECR --holding docker image
  2. ECS (Service-> Task) --Run docker container
  3. ALB-Target ECS task (docker container)
  4. Route 53-Docker inter-container communication using ECS service discovery

I'm using EC2 instead of Fargate. (Because it's cheap ...) ECS and ALB allow you to run multiple tasks-> docker containers on one EC2 instance.

This allows you to deploy without updating or adding new instances. (Unless it is the memory limit of the instance) The existence of the EC2 instance itself becomes thin, and since it is run by docker, it does not connect with SSH itself.

EC2 also uses Spot Instances. This has resulted in a 70% reduction in charges compared to on-demand instances, depending on the instance type. スクリーンショット 2020-10-27 22.29.05.png

Instead of a big discount, it's a substitute that can be interrupted by AWS at any time. If it fails, Spot Fleet will automatically replenish the required instances and the ECS service will replenish the required tasks. At the same time, the health check by ALB works, abnormal things are removed from the target, and normal things are targeted.

Since I used the t3.micro instance this time, it costs about 1 yen per hour, and if it is operated by 2 units for 1 month, it will be about 1440 yen. Since we are using Spot Instances, this time it will be 432 yen with a 70% discount, depending on the case.

ALB As mentioned earlier, it is used for combination with ECS, but it is possible to use a certificate by ACM. It's a lot easier than Let's Encrypt, because it's just a push of a button.

However, it will probably cost about 2000 yen depending on the number of accesses. Personal use is a little expensive if you use it with only a single service, but because host-based ruding is possible, If you're running multiple services, I don't think it's a high investment considering ECS, ACM, routing, and managed.

Also, sticky sessions are turned off because the sessions are managed by Redis. This is because if EC2 is suddenly dropped when it is turned on, access will be made to the dropped EC2.

Route 53 It will be around the domain. By the way, regarding the domain, we transferred from "Name.com" to "Route 53" last year. Basically, it cost about 50 yen every month.

Aside from that, there is a feature called ECS service discovery. It is difficult to realize communication between containers with ECS. (Although different containers on the same server can be connected, different servers, network mode, etc ...)

In service discovery, services registered with ECS can be automatically registered / updated in the internal host zone. If the network mode is awsvpc, A record can be registered.

Since a new host zone will be created, it will cost about 50 yen anew, but it is very helpful to automatically link the service and the internal domain. Even if the service (task) goes down, ECS will automatically return and link it.

In this environment, Redis (not ElastiCache) is set up in ECS, so It is used to communicate with the application side (separate container). It would be nice to be able to use ElastiCache, but it costs money ...

S3 In S3, you will be charged for the conserved quantity and GET / PUT. In this system, master data and pre / post data are stored in S3. Since PUT can only be done on the system side, I try not to generate it as much as possible, but since GET depends on the user's timing, cache it in EC2 using Redis so that GET does not occur as much as possible. I am.

スクリーンショット 2020-10-27 22.57.37.png

Crawler measures

Communication also costs money, it is a small thing, but I think that the more services are combined, the greater the impact. Especially in ALB, there is a charge for new connections, so I don't want to waste communication. It's nice that robots.txt can handle it, but I think it's also a good idea to disconnect the inbound connection with ACL.

About 2 months later

I was wondering if it would be unstable because I used Spot Instances, but I was surprised that there was no evidence that it had fallen. I had been developing new functions for several weeks after the replacement, but it is much easier to develop than in the previous environment.

Especially since it is developed with Docker, it is a little difficult at the time of construction, but after that, you just build and update the task. Pandas on the Lambda side is also very easy to handle, and once the mechanism is established, I feel like I can do anything. → Basically, data is generated by Lambda, stored in Redis by Laravel, and the result is returned to the user.

The usage amount of the free tier was as follows before the end of the month. スクリーンショット 2020-10-28 15.31.42.png

It's been over a year since I created my account, so I thought I wouldn't have a one-year free tier. It was a free tier and no billing was incurred. Is it one year from the first billing instead of creating an account?

Future outlook

This time, I feel like I've made the transition. Since all the past data is saved in S3, it is ready to be reflected in DynamoDB whenever necessary. (I feel that historical data is not needed much ....)

Since I was only doing design work, operation work is sparse. For example, the deployment process is manual. I'd like to combine CircleCI and CodeDeploy one after another, but I'm wondering if I'm in trouble if I'm developing by myself.

Besides, the monitoring surroundings are also sparse. We do not monitor the external shape and do not set up SNS. I recently noticed that bots are accessing a lot and consuming LCU soberly, so I set robots.txt and ACL. Since the log is output to CloudWatch, I want to create an environment where I can easily analyze it.

Since we are dealing with numerical data, I think it would be interesting to try machine learning. Actually, I touched it a little (multiple regression), but I screamed on my PC with many parameters ('A' Lambda is a lot tough, so I wonder if it feels like setting up an instance and analyzing it.

Finally

The content of each one has become thin, but if I have the opportunity, I would like to delve into it and write an article. Thank you for reading this far!

Recommended Posts

The story of migrating from home server (MariaDB + Java) to AWS (DynamoDB + Python + PHP) with reduced monthly cost
From the initial state of CentOS8 to running php python perl ruby with nginx
How to get the information of organizations, Cost Explorer of another AWS account with Lambda (python)
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
SSH login to the target server from Windows with a click of a shortcut
The story of moving from Pipenv to Poetry
Extract images and tables from pdf with python to reduce the burden of reporting
The story of trying to contribute to COVID-19 analysis with AWS free tier and failing
The story of launching a Minecraft server from Discord
The wall of changing the Django service from Python 2.7 to Python 3
Learn Nim with Python (from the beginning of the year).
ODBC access to SQL Server from Linux with Python
How to scrape stock prices of individual stocks from the Nikkei newspaper website with Python
How to know the number of GPUs from python ~ Notes on using multiprocessing with pytorch ~
A Python script that allows you to check the status of the server from your browser
The story of implementing the popular Facebook Messenger Bot with python
The story of introducing jedi (python auto-completion package) to emacs
The story of rubyist struggling with python :: Dict data with pycall
Try to automate the operation of network devices with Python
The story of copying data from S3 to Google's TeamDrive
After all, the story of returning from Linux to Windows
Get the source of the page to load infinitely with python.
Aggregate the number of hits per second for one day from the web server log with Python
I want to extract an arbitrary URL from the character string of the html source with python
The latest NGINX is an application server! ?? I measured the benchmark of NGINX Unit with PHP, Python, Go! !!
Try to calculate the position of the transmitter from the radio wave propagation model with python [Wi-Fi, Beacon]
Cheating from PHP to Python
The story that the version of python 3.7.7 was not adapted to Heroku
The story of not being able to run pygame with pycharm
March 14th is Pi Day. The story of calculating pi with python
[Python] Try to graph from the image of Ring Fit [OCR]
A story that struggled to handle the Python package of PocketSphinx
I want to output the beginning of the next month with Python
Output the contents of ~ .xlsx in the folder to HTML with Python
The story of making a standard driver for db with python.
Terminal association from the server side to Amazon SNS (python + boto3)
I tried to improve the efficiency of daily work with Python
From "drawing" to "writing" the configuration diagram: Try drawing the AWS configuration diagram with Diagrams
The story of making a module that skips mail with python
PhytoMine-I tried to get the genetic information of plants with Python
How to deal with the problem that build fails when CI / CD of Python Function with AWS Amplify
[Completed version] Try to find out the number of residents in the town from the address list with Python
The story of making a tool to load an image with Python ⇒ save it as another name