IP spoof using tor on macOS and check with python

I was writing it because I wanted to scrape it with python. If it is accessed from the same IP address for a certain period of time, access will be denied for a while. If a site like this appears, you may not be able to scrape well, so I'm trying to scrape by spoofing the IP address.

However, since it is a confirmation of operation only on macOS, I think that the method is slightly different especially for windows.

By the way, disguise gives a bad impression, but it does not mean that it is bad. Of course, when scraping, please consider the execution time of the program so as not to put a load on the target server.

Things necessary

Python installation

Please install the 3 series. (I think that it will work with 2 systems, but the operation has not been confirmed)

requests library installation

A library that calls an external URL (API) from python. It's like ajax in javascript.

Install with the following command

pip install requests

beautifulsoup4 library installation

It is a library that can take the contents with more detailed conditions after getting the text with request.

pip install beautifulsoup4

tor

It is a tor that allows anonymous communication. Use this for IP spoofing. https://www.torproject.org/

Install with the following command.

brew install tor

After the installation is complete, enter the following command

tor

Various processes will start. It is completed when the following conditions are met.

Jan 28 00:29:59.000 [notice] Bootstrapped 100% (done): Done

Then start tor.

brew services start tor

It's OK if you get ** successfully ** English.

python programming

Let's write python. This time, I accessed the URL to get my own IP address and looked at the result.

You can check your IP address at the following site. https://grupo.jp/myip/

test.py

#UTF-8
import requests
from bs4 import BeautifulSoup

get = requests.get('http://httpbin.org/ip').text
soup = BeautifulSoup(get, 'html.parser')
ip = soup.find('table', class_='pubwaku')

print(get)

First of all, normal scraping execution

python test.py

The following results will be returned. A lot of HTML data will be returned, but look for the location where the IP address and remote host are written as shown below.

<tr><th>IP address</th><td style="font-size:18px;font-weight:bold;">153.999.999.99</td><td class="commentary">現在、接続されるIP address</td></tr>
<tr><th>Remote host</th><td>p554999-************.*****.ne.jp</td><td class="commentary">Host name associated with an IP address</td></tr>

** IP address ** 153.999.999.99

** Remote host ** p554999-*******..ne.jp

If this is left as it is, it will be normal scraping, but by changing to the following description, it will be scraping using tor.

test.py

#UTF-8
import requests
from bs4 import BeautifulSoup

get = requests.get('https://grupo.jp/myip/',
                    proxies=dict(http='socks5://127.0.0.1:9050',
                                 https='socks5://127.0.0.1:9050')).text

soup = BeautifulSoup(get, 'html.parser')
ip = soup.find('table', class_='pubwaku')
                                 
print(ip)

Added proxies part in requests.

Run

python test.py

Let's see the result. Look again for the location where the IP and remote host are written.

The following results will be returned. A lot of HTML data will be returned, but look for the location where the IP address and remote host are written as shown below.

<tr><th>IP address</th><td style="font-size:18px;font-weight:bold;">82.223.99.999</td><td class="commentary">現在、接続されるIP address</td></tr>
<tr><th>Remote host</th><td>tornode3.*******.net</td><td class="commentary">Host name associated with an IP address</td></tr>

** IP address ** 82.223.99.999

** Remote host ** tornode3.*******.net

As you can see, not only the IP address but also the remote host is suitable.

If you run test.py again, the IP address will remain the same. If you want to change the IP address again, restart tor.

Reboot

brew services restart tor

run test.py

python test.py

Check the result.

<tr><th>IP address</th><td style="font-size:18px;font-weight:bold;">109.70.999.99</td><td class="commentary">現在、接続されるIP address</td></tr>
<tr><th>Remote host</th><td>tor-exit-anonymizer.********.net</td><td class="commentary">Host name associated with an IP address</td></tr>

** IP address ** 109.70.999.99

** Remote host ** tor-exit-anonymizer.********.net

Summary

What do you think. As mentioned above, falsification of the IP address can be done easily. Then, it is not so if IP check is useless for DoS attacks. To change the IP address, you have to restart tor, which takes some time. Therefore, it is difficult to attack with different IP addresses hundreds of times per second. Therefore, a program that temporarily rejects a certain number of accesses from the same IP address is effective to some extent. ** However, it is not effective against DDos attacks **

Stop wasting access and mischief with scraping.

Recommended Posts

IP spoof using tor on macOS and check with python
Notes on using rstrip with python.
Using Python and MeCab with Azure Databricks
A memo with Python2.7 and Python3 on CentOS
Check stock prices with slackbot using python
I'm using tox and Python 3.3 with Travis-CI
Installing PIL with Python 3.x on macOS
Settings when using Python 3 requests and Beautiful Soup with crostini on Chromebook
Notes on installing Python3 and using pip on Windows7
Install Python 3.8.6 on macOS Big Sur using pyenv
Install OpenCV 4.0 and Python 3.7 on Windows 10 with Anaconda
Get started with Python on macOS Big Sur
Install Python 3 on MacOS Catalina (with Homebrew only)
Install Python and libraries for Python on MacOS Catalina
Domain check with Python
Check version with python
Initial settings for using Python3.8 and pip on CentOS8
Notes on HDR and RAW image processing with Python
Install selenium on Mac and try it with python
Automatic follow on Twitter with python and selenium! (RPA)
Check types_map when using mimetypes on AWS Lambda (Python)
Get comments on youtube Live with [python] and [pytchat]!
Troublesome story when using Python3 with VScode on ubuntu
Ubuntu 20.04 on raspberry pi 4 with OpenCV and use with python
Build a Python development environment using pyenv on MacOS
Email hipchat with postfix, fluentd and python on Azure
Automate Chrome with Python and Selenium on your Chromebook
Programming with Python and Tkinter
Encryption and decryption with Python
Python and hardware-Using RS232C with Python-
Python on Ruby and angry Ruby on Python
[S3] CRUD with S3 using Python [Python]
Check python coverage with pytest-cov
Record global IP with python
Use Python 3 introduced with command line tools on macOS Catalina
Building a Python environment on a Mac and using Jupyter lab
[Python] Using OpenCV with Python (Basic)
Made python available on macOS
Test Python with Miniconda on OS X and Linux with travis-ci
Using Python with SPSS Modeler extension nodes ① Setup and visualization
Scraping with Tor in Python
Serial communication control with python and I2C communication (using USBGPIO8 device)
python with pyenv and venv
Notes on deploying pyenv with Homebrew and managing Python versions
Broadcast on LINE using python
Serial communication control with python and SPI communication (using USBGPIO8 device)
Using OpenCV with Python @Mac
This and that for using Step Functions with CDK + Python
Works with Python and R
Send using Python with Gmail
Build a 64-bit Python 2.7 environment with TDM-GCC and MinGW-w64 on Windows 7
Build a game leaderboard on Alibaba cloud using Python and Redis
Build a Python environment on your Mac with Anaconda and PyCharm
Error and solution when installing python3 with homebrew on mac (catalina 10.15)
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
How to use python put in pyenv on macOS with PyCall
[Python] Error and solution memo when using venv with pyenv + anaconda
Install lp_solve on Mac OS X and call it with python.
Communicate with FX-5204PS with Python and PyUSB
Complement python with emacs using company-jedi
Shining life with Python and OpenCV