[PYTHON] I made AI patroll the net and created a gadget ranking Web service that is updated once a week

Preface I like gadgets. I love gadgets anyway.

Above all, I'm extraordinarily obsessed with tablets. About 7 or 8 years ago, I bought a gadget called Surface RT, which says "This one is perfect for work and play!" The journey to the best tablet, which is endless while spending money, is still going on.

For now, it's settled down on the iPad Pro 12.9 and Surface Pro X, but soon the promised winning tablet, the Galaxy Tab S7, is announced, the Surface Duo is coming up, and there's a bit of a risk of wallet points. Dangerous. In the first place, I only do Arknights and SNS, so 200,000 based on the old Antutu score should be enough.

Because of this, I check the gadget site quite often. If you include other tech media, leak sites, and all gadget-related sites such as 9to5mac overseas, you'll spend hours reading every day.

To be honest, it's a bit unhealthy. I even visit the same site over and over again, even though it hasn't been updated. It's a gadget zombie anymore. Then what should I do?

Yes, let Python do the tedious things. </ b>

What I made https://gadget-busters.com ![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/696052/dd0a55fe-71cd-314f-b725-ce8b58a35d08.png)

this. It can be seen that the name Gadget Busters is just a program called Miss Busters, which I used to like, and a gadget, and I haven't thought about anything in particular.

Once a week, the gadgets that are the topic of the week are posted in a ranking format, so I think that I can recommend it to those who want to know the gadgets that are the topic now quickly.

The back end crawls articles on the Internet and extracts product names from articles related to gadgets using natural language processing and self-learned prediction algorithms. After that, a score is given and a ranking is created.

Basically, gadgets that are mentioned more times on more sites are judged to be gadgets this season. However, since there is a possibility that "I already know" just by the number of appearances, I try to pick up the product name that is unusual for idf in the algorithm. Specifically, we have introduced a section that emphasizes high-frequency keywords on fewer sites.

The above process is automatically executed once a week to update the above site.

Future development

When data is collected, I would like to create a page where you can see monthly rankings, annual rankings, and gadget trend transition lists. What happened to the gadgets that were popular in 2015? At that time, if you click on the 2015 site, you can see the transition of the trend along with the timeline. The name is ... Gadget Time Machine?

Also, I would like to make the algorithm a little more detailed in the extraction part of the product name so as not to pick up noise. Specifically, it ignores the repetition of product names included in advertisements. I want to make the design an ocean tee, and I look forward to your continued support of Gadget Busters.

Also, I would like to write articles little by little, both as a memorandum and the technology I am using. I haven't done much.

Technology used --Web crawling --Natural language processing --Machine learning --Ranking algorithm - FastAPI - React + Material-UI - AWS

Recommended Posts