[GO] UnicodeEncodeError:'cp932' during python scraping

Overview

I want to extract only the store name from the store name list of goToEat and output it to CSV.

Beautifulsoup requests python3 windows10

I am using.

Error details and reason

I was able to extract the store name including the tag in the form of a list by specifying the html tag with the following code


        urlName = "https://premium-gift.jp/eatosaka/use_store?events=page&id={}&store=&addr=&industry=".format(PageNumber)
        dataHTML = requests.get(urlName)
        soup = BeautifulSoup(dataHTML.content, "html.parser")
        elems = soup.select('h3.store-card__title')

Replace and delete extra information and output to CSV. I was told that i.text can be used to get text information.

    with open(r'C:\Users\daisuke\Desktop\python\eat.csv', 'w') as f:
        writer = csv.writer(f)
        for i in elems:
            """
            i = str(i)
            i = i.replace('<h3 class="store-card__title">', '')
            i = i.replace('</h3>', '')
            i = i.replace('  ', '  ')
            i = i.replace(' ', ' ')
            """
            print(i.text)

            try:
                writer.writerow([i.text])
            except:
                writer.writerow(['error'])

The following error occurs

Live spiny lobster dish Chunagon Osaka Station 3 Building
Traceback (most recent call last):
  File "C:\Users\daisuke\Desktop\python\go_to_eat.py", line 24, in <module>
    writer.writerow(i)
UnicodeEncodeError: 'cp932' codec can't encode character '\xa0' in position 20: illegal multibyte sequence

Reference 1, Reference 2

  1. Since the pages to be scraped are made with various character codes, they are automatically decoded with any character code during scraping.
  2. The target character code is OS-dependent, and CP932 (shift_jis) is selected for windows.
  3. This is a Japanese character code and does not support ** \ xa0 (no break space) **

Solution

Therefore, we replaced the non-breaking space with a half-width space as shown below. So to speak, it's not good because it's a symptomatic treatment.


        for i in elems:
            i = str(i)
            i = i.replace('<h3 class="store-card__title">', '')
            i = i.replace('</h3>', '')
            i = i.replace('  ', '  ')
            i = i.replace(' ', ' ')
            print(i)

            try:
                writer.writerow([i])
            except:
                writer.writerow(['error'])

Perhaps the best thing is to specify a character code that can properly express the character in question. If you give the encoding keyword argument to the open () function as shown below, you can directly specify the character code used in the automatic conversion, so make it UTF-8 etc. that can express Unicode characters. That's fine.

The characters are garbled when the CSV file is opened, but it is okay if you change the character code.


with open(r'C:\Users\daisuke\Desktop\python\eat.csv', 'w', encoding='utf-8') as f:

However, when reading from CSV, an unnecessary blank column was added as shown below. ~~ I still don't know why. ~~ A detailed person told me in the comments and solved it! Thank you

['Wolfgang Steakhouse by Wolfgang Steakhouse Osaka']
[]
['Vineyard']
[]
['Sumikoku Rotating Chicken Cuisine LUCUA']

Recommended Posts

UnicodeEncodeError:'cp932' during python scraping
[Scraping] Python scraping
Python scraping notes
Python Scraping get_ranker_categories
Scraping with Python
Scraping with Python
Python Scraping eBay
Python Scraping get_title
Python: Scraping Part 1
Scraping using Python
Python: Scraping Part 2
Scraping with Python (preparation)
Try scraping with Python.
Basics of Python scraping basics
Scraping with Python + PhantomJS
Scraping with Selenium [Python]
Python web scraping selenium
Scraping with Python + PyQuery
Scraping RSS with Python
Scraping using Python 3.5 async / await
I tried scraping with Python
Web scraping with python + JupyterLab
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
[Python] Scraping in AWS Lambda
Web scraping notes in python3
Scraping with chromedriver in python
Festive scraping with Python, scrapy
[Python] Passing values during multiprocessing
Scraping using Python 3.5 Async syntax
Scraping with Selenium in Python
Scraping with Tor in Python
Web scraping using Selenium (Python)
Scraping weather forecast with python
Scraping with Selenium + Python Part 2
[Python + Selenium] Tips for scraping
I tried scraping with python
Web scraping beginner with python
Python Crawling & Scraping Chapter 4 Summary
Try scraping with Python + Beautiful Soup
Scraping with Node, Ruby and Python
Web scraping with Python ① (Scraping prior knowledge)
Scraping with Selenium in Python (Basic)
Scraping with Python, Selenium and Chromedriver
pip install mysql-Error handling during python
Web scraping with Python First step
I tried web scraping with python.
Scraping with Python and Beautiful Soup
Exception handling during Python API communication
Let's do image scraping with Python
Execute Python Script during CodeSys # RunTime
Get Qiita trends with Python scraping
[Python] Creating a scraping tool Memo
Beginners use Python for web scraping (1)
Beginners use Python for web scraping (4) ―― 1
"Scraping & machine learning with Python" Learning memo
Scraping 1
Get weather information with Python & scraping
[Python] Scraping lens information from Kakaku.com
Get property information by scraping with python
WEB scraping with Python (for personal notes)