Language processing 100 knock 2020 (Rev 2) often processes after downloading the file.
There are various methods, so I will collect them.
The file is jawiki-country.json.gz in Chapter 3: Regular Expressions.
Conducted at Google Colaboratory requests
requests.py
import requests
url='https://nlp100.github.io/data/jawiki-country.json.gz'
filename='jawiki-country.json.gz'
urlData = requests.get(url).content
with open(filename ,mode='wb') as f: #Byte type can be written with wb
f.write(urlData)
My way.
Requets should be used in the same way as wget
.
Writing a file is a standard method
It may be possible to handle large files such as Download using Python, Requests.
requests2.py
import requests
import os
url='https://nlp100.github.io/data/jawiki-country.json.gz'
filename=os.path.basename(url)
r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
Click here to write directly.
urllib.request
urllib_request.py
import urllib.request
url='https://nlp100.github.io/data/jawiki-country.json.gz'
save_name='jawiki-country.json.gz'
urllib.request.urlretrieve(url, save_name)
I checked it this time and it came out. Download files from the web with python Excellent thing that can even save files
pandas.read_X
read_X.py
import pandas as pd
url='https://nlp100.github.io/data/jawiki-country.json.gz'
df=pd.read_json(url, lines=True)
It was a big success in Chapter 2 pandas
Each command in input/output can be read directly from the url and the compression is automatically determined.
If you read it, it will be Data frame, so you can choose the usage, but if you want to process it as it is, this is fine.
If it's a command, you don't have to think about anything with wget
, but when you do it programmatically, you have to think about it.
I think you should do it either way.
Recommended Posts