Beginners use Python for web scraping (4) ―― 1

This time, we aim to put the scraping program of previous on the cloud and execute it automatically, but first, put the test PGM on the cloud and make it operate normally. I will bring it to that point.

Roadmap for learning web scraping in Python

(1) Succeed in scraping the desired stuff locally for the time being. (2) Link the result of scraping locally to Google Spreadsheet. (3) cron is automatically executed locally. (4) Challenge free automatic execution on the cloud server. (Google Compute Engine) (4) -1 Put the test PGM on the cloud and run it normally on CloudShell ← Now here </ font> (4) -2 Add scraping PGM to the repository and run it normally on CloudShell. (4) -3 Create a VM instance of Compute Engine and have it automatically execute scraping. (5) Challenge free automatic execution without a server on the cloud. (Maybe Cloud Functions + Cloud Scheduler)

Steps to lift resources to GCP

(1) Create a git repository on GCP using git (GitHub account required) (2) Create a clone locally (3) Add the program you want to upload to GCP to the local repository and commit (4) Push to master on GCP

(1) Create a git repository on GCP using git

If you do not have the Gcloud SDK installed, install it. Make sure the gcloudl command is set for the desired project. (For a new project, set the project with the gcloud init command.)

zsh


16:03:04 [~] % gcloud config list
[core]
account = [email protected]
disable_usage_reporting = False
project = my-hoge-app

Your active configuration is: [default]

Create a new repository in Cloud Source Repositories.

zsh


16:41:59 [~] % 
16:42:00 [~] % gcloud source repos create gce-cron-test
Created [gce-cron-test].
WARNING: You may be billed for this repository. See https://cloud.google.com/source-repositories/docs/pricing for details.

An empty repository will be created in the target project like this. スクリーンショット 2020-09-24 21.47.24.png

(2) Create a clone locally

Clone the repository you created in Cloud Source Repositories locally.

zsh


16:44:10 [~] % 
16:44:10 [~] % gcloud source repos clone gce-cron-test
Cloning into '/Users/hoge/gce-cron-test'...
warning: You appear to have cloned an empty repository.
Project [my-hoge-app] repository [gce-cron-test] was cloned to [/Users/hoge/gce-cron-test].

The state where the py file is stored in the created local repository. (You can see that it is a git repository.)

zsh


16:46:15 [~] % 
16:46:15 [~] % cd gce-cron-test
16:46:44 [~/gce-cron-test] % ls -la
total 8
drwxr-xr-x   4 hoge  staff   128  9 23 16:45 .
drwxr-xr-x+ 45 hoge  staff  1440  9 23 16:45 ..
drwxr-xr-x   9 hoge  staff   288  9 23 16:45 .git
-rw-r--r--   1 hoge  staff   146  9 21 15:29 cron-test.py

(3) Add the program you want to upload to GCP to the local repository and commit

Add the file to the index with the git add command Commit to your local repository with the git commit command.

zsh


16:47:21 [~/gce-cron-test] % 
16:47:21 [~/gce-cron-test] % git add .
16:48:03 [~/gce-cron-test] % 
16:48:04 [~/gce-cron-test] % git commit -m "Add cron-test to Cloud Source Repositories"
[master (root-commit) 938ea70] Add cron-test to Cloud Source Repositories
 1 file changed, 5 insertions(+)
 create mode 100644 cron-test.py

(4) Push to master on GCP

Push to master (Cloud Source Repositories).

zsh


16:50:15 [~/gce-cron-test] % 
16:50:15 [~/gce-cron-test] % git push origin master
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 349 bytes | 116.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://source.developers.google.com/p/my-hoge-app/r/gce-cron-test
 * [new branch]      master -> master

You can see that you were able to push to master along with the commit message. スクリーンショット 2020-09-24 21.30.53.png

Confirmation of operation with Cloud Shell

Let's test it on Cloud Shell on GCP.

Select the desired project and launch Cloud Shell. スクリーンショット 2020-09-25 16.53.01.png

The terminal will start. スクリーンショット 2020-09-25 16.53.41.png

Clone the git repository from master as you would local.

bash


cloudshell:09/25/20 02:59:00 ~ $ gcloud source repos clone gce-cron-test
Cloning into '/home/hoge/gce-cron-test'...
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
Project [my-xxx-app] repository [gce-cron-test] was cloned to [/home/hoge/gce-cron-test].

It was cloned.

bash


cloudshell:09/25/20 03:01:49 ~ $ cd gce-cron-test
cloudshell:09/25/20 03:02:09 ~/gce-cron-test $ ls -la
total 20
drwxr-xr-x  3 hoge hoge 4096 Sep 23 10:59 .
drwxr-xr-x 13 hoge rvm  4096 Sep 23 11:18 ..
-rw-r--r--  1 hoge hoge  146 Sep 23 09:03 cron-test.py
drwxr-xr-x  8 hoge hoge 4096 Sep 23 09:03 .git

Check the python path and version. 3.8.5 is pre-installed in this environment with pyenv.

bash


cloudshell:09/25/20 03:02:21 ~/gce-cron-test $ which python
/home/hoge/.pyenv/shims/python
cloudshell:09/25/20 03:02:42 ~/gce-cron-test $ python -V
Python 3.8.5

As shown below, it works normally on CloudShell.

bash


cloudshell:09/25/20 03:02:50 ~/gce-cron-test $ python cron-test.py
2020/09/25 03:03:11 cron works!
cloudshell:09/25/20 03:03:12 ~/gce-cron-test $

However, crontab didn't work. The Cloud Shell environment seems to be an environment that only accepts interactive interactive commands. .. .. Next time, I will add the scraping PGM to the repository and run it normally on CloudShell.

Bonus: About Cloud Shell

CloudShell is an IDE environment that can be used on google's cloud, a kind of virtual VM environment with a 5GB Disk, and a Theia-based code editor can also be used.

You can also edit hidden files with an editor

bash


$ cloudshell edit $HOME/.bashrc

You can also download it.

bash


$ cloudshell download $HOME/.bashrc

[CloudShell] https://cloud.google.com/shell/?hl=ja

Recommended Posts

Beginners use Python for web scraping (1)
Beginners use Python for web scraping (4) ―― 1
Beginners can use Python for web scraping (1) Improved version
Beginners use Python for web scraping (4) --2 Scraping on Cloud Shell
[For beginners] Try web scraping with Python
WEB scraping with Python (for personal notes)
python textbook for beginners
Python web scraping selenium
OpenCV for Python beginners
Web scraping with python + JupyterLab
Web scraping notes in python3
Learning flow for Python beginners
Python #function 2 for super beginners
Web scraping using Selenium (Python)
Basic Python grammar for beginners
100 Pandas knocks for Python beginners
Python for super beginners Python #functions 1
[Python + Selenium] Tips for scraping
Web scraping beginner with python
Data analysis for improving POG 1 ~ Web scraping with Python ~
Tips for Python beginners to use Scikit-image examples for themselves 4 Use GUI
Python Exercise for Beginners # 2 [for Statement / While Statement]
Web scraping with Python ① (Scraping prior knowledge)
Web teaching materials for learning Python
Python for super beginners Python # dictionary type 1 for super beginners
Web scraping with Python First step
I tried web scraping with python.
What is scraping? [Summary for beginners]
Next, use Python (Flask) for Heroku!
Python #index for super beginners, slices
<For beginners> python library <For machine learning>
Python #len function for super beginners
[For beginners] Web scraping with Python "Access the URL in the page to get the contents"
Tips for Python beginners to use the Scikit-image example for themselves
Python #Hello World for super beginners
web scraping
Python for super beginners Python # dictionary type 2 for super beginners
INSERT into MySQL with Python [For beginners]
[Python] Minutes of study meeting for beginners (7/15)
Use DeepL with python (for dissertation translation)
[Beginner] Python web scraping using Google Colaboratory
Getting Started with Python Web Scraping Practice
[Personal note] Web page scraping with python3
[Python] Organizing how to use for statements
Web scraping with Python ② (Actually scraping stock sites)
Horse Racing Site Web Scraping with Python
Getting Started with Python Web Scraping Practice
[Python] Web application design for machine learning
[Python] Read images with OpenCV (for beginners)
How to use "deque" for Python data
WebApi creation with Python (CRUD creation) For beginners
Use pathlib in Maya (Python 2.7) for upcoming Python 3.7
Practice web scraping with Python and Selenium
Easy web scraping with Python and Ruby
Atcoder standard input set for beginners (python)
Preparation for scraping with python [Chocolate flavor]
A textbook for beginners made by Python beginners
2016-10-30 else for Python3> for:
python [for myself]
Python scraping notes
Python Scraping get_ranker_categories