Create a data collection bot in Python using Selenium

** Selenium Web Driver ** is an extension of the ** Python ** toolkit. With ** Selenium **, you can mimic user behavior and automate daily repetitive tasks like data scraping.

Overview

What to tell in this article

  1. Instantiate Alibaba Cloud Windows instance

  2. Installation

  1. Write a Python Selenium script to check the crypto market

--Write a class for our "Bot" --Get_crypto_data function in class --Write_file function of a class that writes real-time data to a local cloud file.

Log on to your Alibaba Cloud Windows instance to get started.

Installation Open Internet Explorer and go to mozilla.org. You may need to add some exceptions to the built-in Windows Firewall to enable the Firefox installer download.

Download Firefox from www.mozilla.org.

Download the Python 3.7.1 MSI installer from http://www.python.org/download/.

Run the installer. Be sure to check the option to add Python to your PATH during installation.

The Windows version of Python 3.7.1 includes the pip installer. It makes it easy to install all the Python modules you need. Specifically, it is a Selenium web driver.

Go to the Windows Power Shell interface and verify that Python was successfully installed.

python --version

Here, Python 3.7.1 is taken as an example. Let's install Selenium with pip immediately.

pip install selenium

Now you need to install the appropriate web driver for Selenium. I'm using Mozilla for my browser, so I need a Gecko driver. The process is similar to installing a Chrome web driver.

https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-win64.zip

Save the link to a local file on your Alibaba Windows instance. Then unzip the zip. Unzip it to your desktop and add the file to your Power Shell system path to see it.

setx path "%path%;c:\Users\Administrator\Desktop\geckodriver.exe"

Now, let's organize the development environment. You can also use notepad to write code. I prefer to use notepad ++, a free open source IDE for syntax highlighting.

https://notepad-plus-plus.org/downloads/v7.8.2/

Python script

Now let's decide what we want the Python Selenium "Web Bot" to do. Well, if our "bot" gets some cryptocurrency market data for us, it's cool. Write that data to a local file. You can then plug that data into other Alibaba cloud services. Get ready for input.

Now let's get started with coding. Launch the IDE and import Selenium. We will also import time and datetime, as we will see later.

from selenium import webdriver
import datetime
import time

First, you can let Selenium actually do something and know what to expect.

browser = webdriver.Firefox()
browser.get("http://www.baidu.com/")

Save the file with a .py extension.

Double-click the file.

The Selenium web driver now programmatically opens web pages with three lines of code.

Now let's create a "Bot" class.

It's a good idea to think about what the program wants to achieve.

I like to do this step by step. Then, we will break down these steps into bot functions.

  1. I want to go to a web page programmatically
  2. I want to scrape some data
  3. I want to pass that data as a variable
  4. I want to save the data in a file on the cloud

First, let's define a basic Selenium "Bot" class.

from selenium import webdriver

class Bot():
    def __init__(self,url):
        self.browser = webdriver.Firefox()
        self.url = url
    def get_web_page(self):
        self.browser.get(self.url)
bot = Bot("http://www.baidu.com/")
bot.get_web_page()

That's a more rational class-based structure for organizing our programs.

Set the class. Define the browser variable as Firefox Webdriver. Defines the URL to pass to the class. Next, create a basic web page retrieval function and call the browser with the URL you defined during initialization.

Let's run the script. The script initializes and executes the Bot class. Selenium webdriver opens a Firefox browser window and goes to www.baidu.com.

Before that, define the cryptographic data acquisition function. Go to www.coinwatch.com and get the xpath of the element for which you want to get the data. Then scrape the Bitcoin market data.

Firefox's web development tools are suitable for this task, and you can use them to find the xpath of the element you need.

Go to www.coinwatch.com and hover your mouse over the search bar in the upper right corner. 1.png

Right-click an element in the search box and select Inspect Element from the menu.

At the bottom of the page, an Inspector window with the elements highlighted opens. Right-click on the highlighted text in the Inspector. Go to the Copy submenu and select Copy Xpath. This saves the element's Xpath to the clipboard and pastes it into your Python Selenium script. Let's also get the xpath to the post button.

Let's get started

from selenium import webdriver
class Bot():
    def __init__(self,crypto):
        self.browser = webdriver.Firefox()
        ### CRYPTO TO SEARCH FOR
        self.crypto = crypto
    def get_crypto_data(self):
        self.browser.get("https://www.coinwatch.com/")
        ### FIND SEARCH FORM AND IMPUTE KEY WORD
        print("FINDING SEARCH FORM")
        element = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/header/section[3]/div/div/div/div[1]/div/input")
        element.clear()
        element.send_keys(self.crypto)
        element = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/header/section[3]/div/div/div/div[1]/div/div/button")
        element.click()


bot = Bot("BTC")
bot.get_crypto_data()

In the above, the web driver is being initialized. Here, we define the variable crypto, which is the search term for cryptocurrencies. Then pass this crypto variable to the class, specify the self parameter and pass it to the get_crypto_data function. The get_crypto_data function calls a web browser to get https://www.coinwatch.com/. When you call the web page, look for the element in the copied xpath. The first element is the search box. Clear the search element first and then send the crypto variable as an argument. Then find and click the submit button. Instantiate the bot as a bot and pass "BTC" as the crypto you want to search.

You need to go to the main page of Bitcoin. Now let's get the opportunity to get and copy all the Xpaths to every element we want to collect data on. I think for now we need price, caps, 24-hour volume and 24-hour variation. Let's create a variable to hold these values in our script. Then pass the class-wide variables to the get_crypto_data function. Once the element is in place, use Selenium's .text method to get the text content of the element. Then print it out at the terminal. Depending on the web page, it may take some time to load the DOM element, so if you get an error such as the element not found, you may have to wait for a period of time. The simplest way to do this is to use time.sleep ().

Let's do this and see what happens.

from selenium import webdriver
import datetime
import time

class Bot():
    def __init__(self,crypto):
        self.browser = webdriver.Firefox()
        ### CRYPTO TO SEARCH FOR
        self.crypto = crypto
        self.price = None
        self.cap = None
        self.volume_24h = None
        self.change_24h = None
    def get_crypto_data(self):
        self.browser.get("https://www.coinwatch.com/")
        ### FIND SEARCH FORM AND IMPUTE KEY WORD
        print("FINDING SEARCH FORM")
        elem = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/header/section[3]/div/div/div/div[1]/div/input")
        elem.clear()
        elem.click()
        elem.send_keys(self.crypto)
        elem = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/header/section[3]/div/div/div/div[1]/div/div/button")
        time.sleep(1)
        elem.click()
        ### GET MARKET DATA
        print("GETTING MARKET DATA")
        time.sleep(5)
        self.price = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[1]/div/div[3]/div/div/span").text
        self.cap = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/section/div[1]/div/div[2]/div/div[1]/span[2]").text
        self.volume_24h = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/section/div[1]/div/div[2]/div/div[2]/span[2]").text
        self.change_24h = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/section/div[1]/div/div[2]/div/div[8]/span[2]").text
        print("PRINTING MARKET DATA")
        print("PRICE " + str(self.price))
        print("CAP " + str(self.cap))
        print("24H VOLUME " + str(self.volume_24h))
        print("24H CHANGE " + str(self.change_24h))
        self.browser.close()
        return
bot = Bot("BTC")
bot.get_crypto_data()

Now you have the data as a variable that you can pass to the entire bot class. Let's pass this to the write_file function. Then time stamp the system and save it to a local file on Alibaba's cloud.

def write_file(self):
    save_file = open(self.crypto + "_market.txt","a")
    time_stamp = str(datetime.now())
    save_file.write("\n" + time_stamp + "\n")
    save_file.write("\nPRICE: " + str(self.price))
    save_file.write("\nCAP: " + str(self.cap))
    save_file.write("\n24H VOLUME: " + str(self.volume_24h))
    save_file.write("\n24H CHANGE: " + str(self.change_24h))

In the above code, the file in which the crypto variable is concatenated to market.txt is opened as the file name to be written. a "opens the file in appendix mode, so don't overwrite the previous data. Then create a time_stamp variable to call the date and time as a string. Then create a string of market data. Since the part where "n" is concatenated is a line break, everything does not fit on one line.

Of course, you can scrape any cipher this way by changing the arguments when initializing the bot class. Let's scrape the market data of Etherium.

btc_bot = Bot("BTC")
eth_bot = Bot("ETH")
btc_bot.get_crypto_data()
btc_bot.write_file()
eth_bot.get_crypto_data()
eth_bot.write_file()

Now you have an automated way to get the dataset and return it as a variable. You can write this to a local file. This data can be plugged into, for example, Alibaba machine learning services for data analysis or relational databases for tracking.

Below is the final code.

from selenium import webdriver
from datetime import datetime
import time

class Bot():

    def __init__(self,crypto):
        self.browser = webdriver.Firefox()
        ### CRYPTO TO SEARCH FOR
        self.crypto = crypto
        self.price = None
        self.cap = None
        self.volume_24h = None
        self.change_24h = None
    def get_crypto_data(self):
        self.browser.get("https://www.coinwatch.com/")
        ### FIND SEARCH FORM AND IMPUT KEY WORD
        print("FINDING SEARCH FORM")
        elem = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/header/section[3]/div/div/div/div[1]/div/input")
        elem.clear()
        elem.click()
        elem.send_keys(self.crypto)
        elem = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/header/section[3]/div/div/div/div[1]/div/div/button")
        time.sleep(1)
        elem.click()
        ### GET MARKET DATA
        print("GETTING MARKET DATA")
        time.sleep(5)
        self.price = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[1]/div/div[3]/div/div/span").text
        self.cap = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/section/div[1]/div/div[2]/div/div[1]/span[2]").text
        self.volume_24h = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/section/div[1]/div/div[2]/div/div[2]/span[2]").text
        self.change_24h = self.browser.find_element_by_xpath("//html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/section/div[1]/div/div[2]/div/div[8]/span[2]").text
        print("PRINTING MARKET DATA")
        print("PRICE " + str(self.price))
        print("CAP " + str(self.cap))
        print("24H VOLUME " + str(self.volume_24h))
        print("24H CHANGE " + str(self.change_24h))
        self.browser.close()
        return

    def write_file(self):

        time_stamp = str(datetime.now())

        save_file = open(self.crypto + "_market.txt","a")
        save_file.write(self.crypto)
        save_file.write("\n" + time_stamp + "\n")
        save_file.write("\nPRICE: " + str(self.price))
        save_file.write("\nCAP: " + str(self.cap))
        save_file.write("\n24H VOLUME: " + str(self.volume_24h))
        save_file.write("\n24H CHANGE: " + str(self.change_24h) + "\n")

btc_bot = Bot("BTC")
eth_bot = Bot("ETH")

btc_bot.get_crypto_data()
btc_bot.write_file()

eth_bot.get_crypto_data()
eth_bot.write_file()

Recommended Posts

Create a data collection bot in Python using Selenium
Create a GIF file using Pillow in Python
Create a MIDI file in Python using pretty_midi
Create a function in Python
Create a dictionary in Python
Create a python GUI using tkinter
Create a DI Container in Python
Create a binary file in Python
Create a Kubernetes Operator in Python
Create a random string in Python
Create a LINE Bot in Django
Generate a first class collection in Python
Create a simple GUI app in Python
Create a JSON object mapper in Python
[Python] Create a Batch environment using AWS-CDK
[Python] [LINE Bot] Create a parrot return LINE Bot
Get Youtube data in Python using Youtube Data API
Scraping a website using JavaScript in Python
Develop slack bot in python using chat.postMessage
[GPS] Create a kml file in Python
Draw a tree in Python 3 using graphviz
Create a record with attachments in KINTONE using the Python requests module
[Python] [Word] [python-docx] Try to create a template of a word sentence in Python using python-docx
Create a Vim + Python test environment in 1 minute
Create a LINE BOT with Minette for Python
I want to create a window in Python
Create a standard normal distribution graph in Python
How to create a JSON file in Python
Create a virtual environment with conda in Python
Create a binary data parser using Kaitai Struct
Create a web map using Python and GDAL
View drug reviews using a list in Python
Data analysis in Python: A note about line_profiler
Steps to create a Twitter bot with python
Create a simple momentum investment model in Python
Create a new page in confluence with Python
Create a datetime object from a string in Python (Python 3.3)
Create a package containing global commands in Python
Create a Mac app using py2app and Python3! !!
Create a loop antenna pattern in Python in KiCad
A well-prepared record of data analysis in Python
[Docker] Create a jupyterLab (python) environment in 3 minutes!
Create a Python module
Data analysis using Python 0
Create SpatiaLite in Python
Data cleaning using Python
Create a Python environment
Create a slack bot
Create an elliptical scatter plot in Python without using a multivariate normal distribution
Create your own Big Data in Python for validation
Receive dictionary data from a Python program in AppleScript
Collectively register data in Firestore using csv file in Python
[CRUD] [Django] Create a CRUD site using the Python framework Django ~ 1 ~
[LINE Messaging API] Create a rich menu in Python
A collection of code often used in personal Python
[Python] Create a ValueObject with a complete constructor using dataclasses
Create a plugin to run Python Doctest in Vim (2)
Get LEAD data using Marketo's REST API in Python
Create a plugin to run Python Doctest in Vim (1)
In Python, create a decorator that dynamically accepts arguments Create a decorator
Until you insert data into a spreadsheet in Python