[PYTHON] A brief note on the anger caused by scraping

I'm a 15 graduate. I heard that it's okay to be a related person, so I would like to introduce the story of scraping and anger during the SFC era. image.png

Final book

We will report on the results of scraping the company word-of-mouth site and the data exchanged on the campus server.

When to get angry

That happened around the time of job hunting. I entered SFC and started programming, and the job hunting period is the time when I just got rid of the basics of emotions, of pros, game pros, etc. and said "I fully understood programming". .. In particular, I was good at data analysis, and it was a time when I was doing research and hobbies to collect data on the Web using a technology called scraping and analyze and visualize it.

Angry background

Keio University, especially SFC, calls itself "Practical School". Practical use of scraping and data analysis techniques has always been a corner of my mind. At that time, as mentioned above, I was in the job hunting period, so I came up with the idea, "Yes, let's science job hunting."

At that time, there was a word-of-mouth site of a company that I often used, and that site had a lot of quantitative and qualitative data, so to speak, it was a "company tabelog". Job hunting students were able to use the paid functions for a certain period of time free of charge, so I was very helpful. I decided to create a database of this, build a recommendation engine, and then moved on to implementation.

Scraping is a fairly omnipotent technique. While general people are copying and pasting data on the Web, all the data is pulled out. Very fun.

Cause of anger

Up to that point, it wasn't bad. (Maybe it wasn't in terms of terms of use) However, when I was in good shape, I gave it to the campus server with privileges that anyone could access. In other words, I sent it to a friend by mentioning it on Twitter.

The operator of the job change site was egosurfing this. On that day, the DM came and said, "Erase, and write a memorandum that I was bad."

Contents of the memorandum

――What I did was an act that exceeded the terms of use. ――We will pay the full amount of damages to the operating company due to sharing this data.

It was a content that made students uneasy. I was asked to submit a document that seems to be legally binding, whether it is a seal or a registered seal.

Of course, no damages have been claimed until now, but I was a job hunter at that time, which made me feel uneasy for a long time.

The validity of this memorandum is still not well understood. To be honest, I still have a bad feeling that I've been playing a lot of power for students. (Of course, from the standpoint of a company now, I think it is necessary from the perspective of protecting stakeholders such as shareholders and employees.)

In fact, the laws around scraping have other very subtleties, such as the server load issue of the other party. For those who do scraping on a daily basis, it is recommended that you at least check the legal arrangements on the following sites and scrape conservatively. Is scraping illegal? Attorneys explain three legal issues and countermeasures in 5 minutes | Top Coat International Law Office

For pages that can only be accessed after logging in, we did not scrape them with some consideration.

Later talk

I'm sorry to say that it's now, but I personally continued to use the scraping data, and the recommendation engine was a big success (although it was a very shabby and primitive implementation). Since I was a student with no axis, when I got a job, I issued ES from the top in the recommended order, and finally I got a job offer and got a job at the company recommended around 3rd place. (As an aside, this is an interpretation that I was able to approach the [blind spot window] of Johari window (https://lightworks-blog.com/johari-window#3-3))

By the way, my account wasn't banned for some reason.

Finally

It was a happy ending in the end, but I was scared when I wrote the memorandum. I sincerely hope that you will handle data and scripts safely and perform interesting analysis.

Recommended Posts

A brief note on the anger caused by scraping
A note on customizing the dict list class
A note on the default behavior of collate_fn in PyTorch
A note on how to check the connection to the license server port
Fixed-point observation of specific data on the Web by automatically executing a Web browser on the server (Ubuntu16.04) (2) -Web scraping-
A note on enabling PostgreSQL with Django
[Python] A progress bar on the terminal
A note about doing the Pyramid tutorial
Find the ideal property by scraping! A few minutes walk from the property to the destination
Execution environment on the Web by "Project Jupyter"
A note on optimizing blackbox functions in Python
A note about the python version of python virtualenv
Calculate the probability of outliers on a boxplot
Get a list of Qiita likes by scraping
A note about the new style base class
Create a GUI on the terminal using curses
I did a little research on the class
A note on the library implementation that explores hyperparameters using Bayesian optimization in Python