Download the file in Python

Language processing 100 knock 2020 (Rev 2) often processes after downloading the file.

There are various methods, so I will collect them.

The file is jawiki-country.json.gz in Chapter 3: Regular Expressions.

Conducted at Google Colaboratory requests

requests.py


import requests

url='https://nlp100.github.io/data/jawiki-country.json.gz'
filename='jawiki-country.json.gz'

urlData = requests.get(url).content

with open(filename ,mode='wb') as f: #Byte type can be written with wb
  f.write(urlData)

My way. Requets should be used in the same way as wget.

Writing a file is a standard method

It may be possible to handle large files such as Download using Python, Requests.

requests2.py


import requests
import os

url='https://nlp100.github.io/data/jawiki-country.json.gz'
filename=os.path.basename(url)

r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
  for chunk in r.iter_content(chunk_size=1024):
    if chunk:
      f.write(chunk)
      f.flush()

Click here to write directly.

urllib.request

urllib_request.py


import urllib.request

url='https://nlp100.github.io/data/jawiki-country.json.gz'
save_name='jawiki-country.json.gz'

urllib.request.urlretrieve(url, save_name)

I checked it this time and it came out. Download files from the web with python Excellent thing that can even save files

pandas.read_X

read_X.py


import pandas as pd

url='https://nlp100.github.io/data/jawiki-country.json.gz'

df=pd.read_json(url, lines=True)

It was a big success in Chapter 2 pandas

Each command in input/output can be read directly from the url and the compression is automatically determined.

If you read it, it will be Data frame, so you can choose the usage, but if you want to process it as it is, this is fine.

Summary

If it's a command, you don't have to think about anything with wget, but when you do it programmatically, you have to think about it.

I think you should do it either way.

Recommended Posts

Download the file in Python
Save the binary file in Python
Download the file while viewing the progress in Python 3.x
File processing in Python
Parallel download in Python
File operations in Python
Read the file line by line in Python
Read the file line by line in Python
[Python] Read the specified line in the file
Find the difference in Python
Download csv file with python
Download python
Getting the arXiv API in Python
File / folder path manipulation in Python
Python in the browser: Brython's recommendation
Implemented file download with Python + Bottle
Hit the Sesami API in Python
Linebot creation & file sharing in Python
Get the desktop path in Python
Download Google Drive files in Python
Get the script path in Python
In the python command python points to python3.8
Implement the Singleton pattern in Python
Create a binary file in Python
Download the file deployed with appcfg.py
The story of the "hole" in the file
Hit the web API in Python
Extract the targz file using python
I wrote the queue in Python
Calculate the previous month in Python
Examine the object's class in python
Get the desktop path in Python
ORC, Parquet file operations in Python
Get the host name in Python
Access the Twitter API in Python
The first step in Python Matplotlib
I wrote the stack in Python
Master the weakref module in Python
Change the standard output destination to a file in Python
Get the MIME type in Python and determine the file format
[Note] Import of a file in the parent directory in Python
After enabling the python virtual environment in the batch file, run the python file
Google search for the last line of the file in Python
Quadtree in Python --2
Python in optimization
Learn the design pattern "Prototype" in Python
CURL in python
Learn the design pattern "Builder" in Python
Script python file
Load the remote Python SDK in IntelliJ
Geocoding in python
SendKeys in Python
Exclusive control with lock file in Python
Check the behavior of destructor in Python
Meta-analysis in Python
Learn the design pattern "Flyweight" in Python
Read CSV file with python (Download & parse CSV file)
Try using the Kraken API in Python
Learn the design pattern "Observer" in Python
Learn the design pattern "Memento" in Python
Unittest in python