[PYTHON] Participated in the first ISUCON with the team "Lunch" # ISUCON10 Qualifying

TL;DR Rust reference implementation is fast Overwhelming thanks to New Relic.

Thank you for your hard work in ISUCON10 qualifying

localhost_8080_.png

Final score was 722 (reference value). isuumo, I think it was an interesting issue. Thank you to the management who took care of us while preparing for the day, and the members who fought together! Thank you for your hard work!

Lunch

It is a team of 3 students from M1 seniors (Masapyon, koba) and B3 me (Hiroya_W) from the same university. At that time, I heard that the ISUCON10 qualifying quota was likely to fill up at once, so I hurriedly registered to participate solo. After that, I was recruiting members, so I was asked to join!

image.png

"ISUCON Summer Course 2020 Lecture & Mokumokukai" I learned about ISUCON for the first time, and since I participate in both the lecture and the Mokumokukai, I actually participated. Let's participate! I was thinking. At first, the Mokumoku-kai was wondering if each of them would solve the problem.

However, when I actually participated in it, it was a meeting where everything was prepared and everyone could get together and teach me how to take steps. I think it was a very good course because it was only intended for those who have never participated in ISUCON in the past.

Advance preparation

I was planning to participate in Go because it happened that I was planning to use Go in another case and there are many people who are participating in Go. So, first of all, I touched "A Tour of Go" for the time being. From there, I thought I'd try Go's web framework tutorial, but I didn't.

This is because I participated in a web contest held by New Relic "How to use New Relic quickly with ISUCON because it's free". When deploying New Relic's Agent on Go

--You need to write your own middleware for measurement --Along with that, the method will be replaced with another one, so at the end it is necessary to restore the method again.

When I learned that, I got the impression that it was going to get stuck here. In comparison, Python and Ruby are simply installed as a package in your app and in most cases you don't have to write them yourself. In that case, everyone can read with confidence in Python, so I want to make sure to use New Reilc! I switched to the policy.

From there to the actual performance, I used [^ 1] only for the ISUCON9 qualifying to acquire the necessary knowledge. When reading ISUCON9 Online Qualifying Related Entry Summary from the beginning and practicing individually, ISUCON environment with Vagrant / matsuu / vagrant-isucon) was created, and when the members gathered and practiced, I created an ISUCON environment on GCP and practiced.

What I did was just write down what I tried as an article on esa.io, and finally I organized the items in the Wiki of the GitHub repository. I'm sorry I can't publish the license key for New Relic because it's written in plain text [^ 2].

image.png

I think that the basics such as how to take logs such as Makefile [^ 3] and New Relic, secret sauce, and how to put up INDEX around DB are the main focus. When troubleshooting, I have summarized it so that it can be used as a summary of articles that may be helpful.

On the day

The seniors gathered in the laboratory and I participated online from home. I used a service called whereby because I wanted to be able to see the work screens of all the team members, but it was very easy to use [^ 4].

Since the start was delayed to 12:20, we decided to meet around 11:30, and by then we had a light meal.

image.png

In the morning, I could only watch that the operation was real ISUCON. I think it was difficult to prepare for the problem, but thank you again for your hard work.

After that, ** ISUCON10 has a Rust implementation, which is interesting-it seems to be early. This is a hint. ** **

Division of roles

In the schedule

However, thanks to the fact that I was able to study the infrastructure by the day, the actual division of roles was

With that feeling, I also took care of the infrastructure a little, and in the meantime, I asked Masapyon to help me with the app.

From here, I will write what I did on the day and what the members did as far as I can remember.

12:20 ~ 13:11 I was able to access the server via SSH, and Masapyon pushed it to GitHub. Also, he checked the configuration, and since it was Nginx and MySQL, I was a little relieved that it was the same as the ISUCON9 qualifying.

In the meantime, Hiroya_W and koba read the manual, summarize the important points in the wiki, and actually touch the app.

However, I struggled with port forwarding and displaying the app on the browser ... The manual had instructions on how to use the command, but I had never used it that way, so I was worried because I couldn't figure out how to give the arguments.

When I asked Google Sensei, I found a story about SSH port forwarding. I see. Now that we know that we can communicate between the three standing instances, why not SSH into server 2 and stream from there to the port specified by server 1? ?? ?? Understand that

$ ssh -L localhost:8080:10.162.41.101:80 isucon-server2 

I was happy to be connected. It's about 13:11. After finishing, I was told by another team person that I should SSH to server 1 and port forward to localhost: 80 seen from server 1. This will certainly connect.

$ ssh -L localhost:8080:localhost:80 isucon-server1

I really learned about SSH through ISUCON [^ 5].

13:11 ~ 13:30 I somehow touched the app, and unlike ISUCON in the past, if I touched it with my hands, isuumo wasn't that heavy, and I couldn't immediately judge where it was slow because there was no visibly slow situation. So, while saying that I can't proceed without taking a log by turning the bench, I will turn the bench several times with Python without taking a log after confirming it.

I was looking at htop on the bench, but is it really with CPU1 core and RAM2GB? ?? ?? ?? I was saying. The environment I practiced had some machine specs, so I started to feel that it would be necessary to review the parameter tuning.

Also, no one was able to practice the two-unit configuration, and I was afraid that it would fail in the restart test, so I was planning to tune with one unit this time. However, looking at this spec, I thought that the story would not proceed unless I configured at least two units, so I quietly decided that Hiroya_W would support it.

13:30 ~ 15:00 Anyway, it's a log. Hiroya_W is in charge of log output, aggregation, and sharing on Slack. First, instead of New Relic, we introduced wsgi_lineprof, which is a line profiler for Python [^ 6]. (As you will see later, as a result, the log was not output for some reason)

However, I can't run the benchmark because of a bench malfunction around here, and I can't get the log.

I decided to use that time to set up a development environment locally. However, when I run the app locally and access http: // localhost: 1323, I get a 404. Flask isn't on Gunicorn and Flask doesn't return render_template, so what? It's becoming.

I read nginx.conf and confirmed that Nginx is returning static files located in/ www / data /. The bench seemed to be done for the API too, so I decided to check it locally by POSTing or GETting with curl. To check the operation, cut the branch with git and pull it on the server as appropriate to try it.

15:00 ~ 15:40 It seems that the bench marker has been revived. Immediately, I tried to output the log with wsgi_lineprof, but it did not output well. For the time being, the slow queries of MySQL that were output at the same time are aggregated pt-query-digest and shared on Slack. While I had koba and Masapyon look around the DB, I immediately switched to trying New Relic without having to investigate the cause.

Finally, at 15:40, New Relic started seeing the logs, and all the MySQL queries came up. New Relic is amazing, I can see everything.

The score at this point is 353. I feel like I've finally stood at the ISUCON start line. image.png

15:40 ~ 16:50 From here, we will challenge the two-unit configuration of App and DB.

Relying on the notes I wrote after reading the ISUCON9 qualifying entry

--Set the environment variable MYSQL_HOST to the IP address of the instance --Comment out the MySQL bind-address option -Add authority to isucon user while watching Allow MySQL to be externally connected

I will try. There was an incident where I wanted to connect from Server001 to Server002, but I was able to connect from Server002 to Server001, but I managed to complete the two-unit configuration. Here, the score increases slightly to 435. image.png

16:50 ~ 18:33(koba, Masapyon) From here, we'll apply the fixes that koba and Masapyon have made. But as you can see from the bench results, nothing really goes up. Even if I turn it several times, it doesn't improve. image.png

The INDEX commit that koba put up looks like this (I'll see later, but in fact, this alone didn't execute INDEX)

isuumo/webapp/mysql/db/3_IndexEstate.sql


CREATE INDEX idx_door_height ON isuumo.estate(door_height);
CREATE INDEX idx_door_width ON isuumo.estate(door_width); 

isuumo/webapp/mysql/db/init.sh


- cat 0_Schema.sql 1_DummyEstateData.sql 2_DummyChairData.sql | mysql --defaults-file=/dev/null -h $MYSQL_HOST -P $MYSQL_PORT -u $MYSQL_USER $MYSQL_DBNAME
+ cat 0_Schema.sql 1_DummyEstateData.sql 2_DummyChairData.sql 3_IndexEstate.sql | mysql --defaults-file=/dev/null -h $MYSQL_HOST -P $MYSQL_PORT -u $MYSQL_USER $MYSQL_DBNAME

The function [^ 7] that returns 503 Service Unavailable to the Bot communication implemented by Masapyon looks like this.

app.py


bot_user_agent = re.compile(
    r'ISUCONbot(-Mobile)?|ISUCONbot-Image\/|Mediapartners-ISUCON|ISUCONCoffee|ISUCONFeedSeeker(Beta)?|crawler \(https:\/\/isucon\.invalid\/(support\/faq\/|help\/jp\/)|isubot|Isupider|Isupider(-image)?\+|(bot|crawler|spider)(?:[-_ .\/;@()]|$)/i')

app.py


def block_bot(request):
    user_agent = request.headers.get('User-Agent')
    if user_agent:
        if bot_user_agent.match(user_agent):
            return True
    return False

app.py


    #Make it run first on each endpoint
    if block_bot(request):
        return jsonify({'message': 'Service Unavailable'}), 503

However, the score does not increase ...

I tried it with curl and confirmed that it returned 502 properly, but I was wondering because I could not get the log that returned 502 on the bench, but the load was too weak and the bot did not access it. So, I heard the story.

Besides, koba changes COUNT (*) to COUNT (id),

app.py


-    query = f"SELECT COUNT(*) as count FROM chair WHERE {search_condition}"
+    query = f"SELECT COUNT(id) as count FROM chair WHERE {search_condition}"
-    query = f"SELECT COUNT(*) as count FROM estate WHERE {search_condition}"
+    query = f"SELECT COUNT(id) as count FROM estate WHERE {search_condition}"

Masapyon devised a place to check if the chair passed through the door,

app.py


     w, h, d = chair["width"], chair["height"], chair["depth"]
     query = (
         "SELECT * FROM estate"
         " WHERE (door_width >= %s AND door_height >= %s)"
         "    OR (door_width >= %s AND door_height >= %s)"
         "    OR (door_width >= %s AND door_height >= %s)"
-        "    OR (door_width >= %s AND door_height >= %s)"
-        "    OR (door_width >= %s AND door_height >= %s)"
-        "    OR (door_width >= %s AND door_height >= %s)"
         " ORDER BY popularity DESC, id ASC"
         " LIMIT %s"
     )
-    estates = select_all(query, (w, h, w, d, h, w, h, d, d, w, d, h, LIMIT))
+    estates = select_all(query, (w, min(h, d), h, min(w, d), d, min(w, h), LIMIT))

Instead of counting in SQL with / api / estate / search and / api / chair / search, you can count in Python,

app.py


    #This is the estate part
    search_condition = " AND ".join(conditions)

    query = f"SELECT * FROM estate WHERE {search_condition} ORDER BY popularity DESC, id ASC"
    chairs = select_all(query, params)
    count = len(chairs)
    chairs = chairs[per_page * page:per_page * page + per_page]

    return {"count": count, "estates": camelize(chairs)}

I found from the commit log that he was doing it. I wasn't able to participate in this improvement discussion, but I'm grateful that I've tried various things, and I feel again while writing an entry.

It was very painful that I didn't see any improvement in my score.

18:33 ~ ?? I notice that INDEX is not stretched. I'm glad I noticed ...

Even if I checked the log with New Relic, there was no strange atmosphere, so is it really INDEX? It has become an atmosphere like that. When I was asked by MySQL to check if INDEX was set up with a command,

Masapyon found that the INDEX part was not executed.

app.py


@app.route("/initialize", methods=["POST"])
def post_initialize():
    sql_dir = "../mysql/db"
    sql_files = [
        "0_Schema.sql",
        "1_DummyEstateData.sql",
        "2_DummyChairData.sql",
    ]

Is there a sql_files guy ... In ISUCON9 qualifying, with ʻinitialize`

subprocess.call(["../sql/init.sh"])

Was being executed, so I wondered if that was the case.

koba>INDEX stretch init.Is it okay to write in sh?
Hiroya_W, Masapyon>I think it fits ~

I remember having a conversation. You can't stretch it ...

Then, for the first time, the score increased slightly. I used to go back and forth between 340 and 380, so I was delighted to be close to 400.

image.png

At this point, I finally arrived at 18:30! I'm glad I'm glad! I was saying. Until now, I couldn't put the INDEX on it properly and couldn't check the score properly, so I put up the INDEX again and turned the bench to measure it.

16:50 ~ ??(Hiroya_W)

The timeline went back a bit, and Hiroya_W was trying to upgrade from MySQL 5.7 to MySQL 8 while koba and Masapyon were picking up the fixes and spinning the bench. This is simply because I got the very rough information that "MySQL 8 is faster". There is no way to know that the features in MySQL 8 can be used in ISUCON 10.

Also, no one in our team can learn how to configure 3 units, whether it is App, DB, DB or App, App, DB. So, I left one behind, so I decided to take on the challenge with the idea that I could break it.

As a result of upgrading to MySQL8 and turning the bench, it looks like this.

image.png

Finishing with 0 is stuck in the integrity check of the application. If you think about it, the bench that you ran immediately after that will get a score and will eventually be canceled.

Besides, even if it gets a score, it does not seem to be better than MySQL 5.7 with 100 units. I didn't understand it properly, I didn't understand why the application consistency check didn't pass, the bench was unstable, and I didn't want to fail in the additional test of the operation, so I gave up the upgrade to MySQL8.

I'm thinking that it might have been good to inquire about this during the production. Regarding MySQL8, it is a point that I would like to review properly.

?? ~ 20:20 I put in the secret sauce of MySQL and Nginx that I prepared.

koba puts up INDEX further,

3_IndexEstate.sql


  CREATE INDEX idx_door_height ON isuumo.estate(door_height);
- CREATE INDEX idx_door_width ON isuumo.estate(door_width); 
+ CREATE INDEX idx_door_width ON isuumo.estate(door_width);
+ CREATE INDEX idx_latitude ON isuumo.estate(latitude);
+ CREATE INDEX idx_longitude ON isuumo.estate(longitude);
+ CREATE INDEX idx_rent ON isuumo.estate(rent);
+ CREATE INDEX idx_popularity ON isuumo.estate(popularity); 

4_IndexChair.sql


CREATE INDEX idx_popularity ON isuumo.chair(popularity);
CREATE INDEX idx_price ON isuumo.chair(price);
CREATE INDEX idx_color ON isuumo.chair(color);
CREATE INDEX idx_height ON isuumo.chair(height);
CREATE INDEX idx_width ON isuumo.chair(width);
CREATE INDEX idx_kind ON isuumo.chair(kind);
CREATE INDEX idx_stock ON isuumo.chair(stock);
CREATE INDEX idx_features ON isuumo.chair(features); 

Masapyon applied the changes and turned the bench around. Merge a number of changes into master, spin the bench, and record a 590 score by changing the chair with INDEX. Every time I turned it, the score increased and I think it was amazing.

image.png

The top three benches were the result of tuning around my OS, but the score dropped, so I decided not to put it back immediately.

20:20 ~ 21:00 It was less than an hour after the end, so I decided to do a restart test, but before that, I was wondering how much the score would come out with the reference implementation of Rust, so I decided to run the bench.

image.png

** Safely hit the highest score. ** ** It's faster than the tuned Python implementation [^ 8]! !! It's already a big laugh.

Since the modifications around the app did not work for the score, I decided to discard all the modifications of the app and switch to the Rust implementation. Modified to access MySQL of another instance from the part where INDEX of DB is put, 20:30.

After rebooting, turn the bench to update the final score further and 722.

image.png

Reboot test App-> DB, DB-> App, check that it is working properly and Finish.

Reflections

I read about the effective upgrade from MySQL 5.7 to MySQL 8 in the ISUCON9 qualifying, so I feel that I should have tried it in advance. From there I should have understood the features in MySQL 8. Also, I think it was stabbed that all the members had little knowledge about Nginx and MySQL this time. Especially this time, I have the impression that DB is the center, and I feel that I could not tune out of the current situation with App alone.

Impressions

As a result, I just put INDEX on the initial implementation of Rust and DB. Still, the process of getting there was interesting, so it's OK.

Regardless of the result, I feel like I was able to look at the log, look for the slow part, and try to correct the part close to the correct answer. I think that preparing the environment and bringing it to this stage is the first difficult part of ISUCON, so I think it was good that it was done properly in production.

Also, when writing this entry, I am very grateful that I can see the members who have tried various things from the commit log. I'm glad I left it to you, and on the contrary, I would like to thank the members for letting me take on the challenge.

I will make use of this experience for the next ISUCON 11 and take on the challenge! We look forward to seeing you again next time!

[^ 1]: I tried other problems, but I gave up on ISUCON8 qualifying because the bench fell when I put in New Relic (the cause seems to be browser_monitoring). I gave up on ISUCON9 because I didn't understand Docker. Although it is said that it is only ISUCON9 qualifying, there are things that I do and do not know even if I do it, so it may be correct to use only ISUCON9 qualifying ** "" **.

[^ 2]: I want to publish it someday.

[^ 3]: Since I had a Make study session just before, ["I got the highest score on the first day of ISUCON9 qualifying"](https://to-hutohu.com/2019/09/9 09 / isucon9-qual / #% E8% BE% 9E% E9% 80% 80% E3% 81% 97% E3% 81% 9F% E7% 90% 86% E7% 94% B1). It was timely.

[^ 4]: Initially, I was planning to use Zoom, but even though I tried sharing the screens of everyone with Zoom, I gave up because I didn't know how to see the other party's screen while sharing the screen. However, whether it is whereby or elsewhere, browsing while sharing the screen and coding with VS Code seems to be the limit in the specifications of my PC, and it was discovered that it would hinder my work, so I shared my screen from the middle. Has become a form as appropriate. CPU: Core i5 4210, RAM: 8GB, but the CPU usage rate stuck and the CPU temperature kept 80 ℃, so I hurriedly took out the cooling fan for the notebook PC.

[^ 5]: I was clogged up with problems such as key placement, authority, and fingerprints in advance, so my knowledge naturally accumulated.

[^ 6]: When I made a New Relic gesture with the Python implementation of ISUCON9 qualifying, the MySQL query log did not come up except for set, commit, and rollBack, and I could not see the essential SELECT at all. I'm glad I wrote it on New Relic's Explorer Hub for research. Besides, when I turned the bench in ISUCON8 qualifying, the DOM structure was said to be ~ and the bench fell (the cause seems to be browser_monitoring), and it became a story about how to use New Relic, and it is different for that time. I have established the method of.

[^ 7]: I thought it would be implemented in a Python app, but apparently it can be achieved with Nginx ... I did not know.

[^ 8]: I feel like I haven't measured the score when New Relic is removed in Python.

Recommended Posts

Participated in the first ISUCON with the team "Lunch" # ISUCON10 Qualifying
I participated in the ISUCON10 qualifying!
What I learned by participating in the ISUCON10 qualifying
The first step in speeding up inference with TensorFlow 2.X & TensorRT
The first step in Python Matplotlib
Display Python 3 in the browser with MAMP
MongoDB for the first time in Python
Log in to the remote server with SSH
[Python] Get the files in a folder with Python
12. Save the first column in col1.txt and the second column in col2.txt
The first step in the constraint satisfaction problem in Python
Determine the numbers in the image taken with the webcam
Detect folders with the same image in ImageHash
Load the network modeled with Rhinoceros in Python ②
Load the network modeled with Rhinoceros in Python ①
The story that fits in with pip installation
After attending school, I participated in SIGNATE's BEGINNER limited competition for the first time.