Convert Excel data to JSON with python

Overview

Convert excel data to JSON with python.

The excel data to be converted is from Standard Tables of Food Composition in Japan. I would like to use the following fruit excel file. (Click to download the file.) https://www.mext.go.jp/component/a_menu/science/detail/__icsFiles/afieldfile/2016/01/15/1365344_1-0207r.xlsx

I will try two methods, ʻexcel2json and ʻopenpyxl. (ʻExcel2json` is not working well.)

Method 1 Use excel2json

excel2json is a module that converts excel data to JSON with python, but its functions are very limited. There is no choice but to read excel and create a json file.

I will install it for the time being. Note that if you do not include the -3 at the end, another module will be installed.

$ pip install excel2json-3

Let's use it once it is installed. This is the only code

excel2json.py


from excel2json import convert_from_file

convert_from_file("PATH of excel file you want to convert")

When executed, a json file will be generated in the same directory as the excel file. The name of the json file is the name of the excel sheet (fixed).

When I put the downloaded excel file as it is and execute it, the json file could not be created successfully, so delete the extra part on the table and try again.

The modified excel file looks like this ↓ image.png

When I open the created json file and see ...

{
    "\u98df\u54c1\u7fa4": "07",
    "\u98df\u54c1\u756a\u53f7": "07001",
    "\u7d22\u5f15\u756a\u53f7": 751.0,
    ...
}

It has become unicode like this. When I tried to fix it with extensions called "encode decode" of vscode, it became like this.

{
    "Food group": "07",
    "Food number": "07001",
    "Index number": 751.0,
    ...
}

If the amount of code is small and the data is already in place, it can be done very easily, but it is a little inconvenient because you cannot specify the data to be acquired. Regarding the character code, I could not find any method other than the above, so I will try other methods for the time being.

Method 2 Use openpyxl

openpyxl is a module for operating excel in python. First, install openpyxl

$ pip install openpyxl

I will try using it once it is installed.

Please note that the excell file is the same as the downloaded one. This time, we will get 7 items of food number, food name, dietary fiber, potassium, iron, vitamin B1 and vitamin C.

Click here for code
import openpyxl
import json

load_book = openpyxl.load_workbook('excel file PATH')
sheet = load_book['07 Fruits']
json_path = 'json file PATH'

fruits_list = [{
    "food_id": 0,
    "name": ""
}]
for i in range(9, 183):
    food_id = sheet.cell(row = i, column = 2).value
    name = sheet.cell(row = i, column = 4).value
    dietary_fiber = sheet.cell(row = i, column = 21).value
    potassium = sheet.cell(row = i, column = 24).value
    iron = sheet.cell(row = i, column = 28).value
    vitamin_b1 = sheet.cell(row = i, column = 48).value
    vitamin_c = sheet.cell(row = i, column = 56).value

    food_id = int(food_id)

    if dietary_fiber == 'Tr':
        dietary_fiber = 0
    if potassium == 'Tr':
        potassium = 0
    if iron == 'Tr':
        iron = 0
    if vitamin_b1 == 'Tr':
        vitamin_b1 = 0
    if vitamin_c == 'Tr':
        vitamin_c = 0

    if name.split(" ")[0][0] == '(' or name.split(" ")[0][0] == '(':
        name = name.split(" ")[1]
    else:
        name = name.split(" ")[0]
    
    if fruits_list[-1]['name'] != name:
        fruits_list.append({
            "food_id": food_id,
            "name": name,
            "dietary_fiber": dietary_fiber,
            "potassium": potassium,
            "iron": iron,
            "vitamin_b1": vitamin_b1,
            "vitamin_c": vitamin_c,
        })
    
fruits_list.pop(0)

data_dict = {
    "data": "fruits",
    "fruits": fruits_list
}

with open(json_path, mode = 'w', encoding = 'utf-8') as f:
    f.write(json.dumps(data_dict, ensure_ascii = False, indent = 4))

I will explain the part related to the article.

import openpyxl
import json

load_book = openpyxl.load_workbook('excel file PATH')
sheet = load_book['sheet name']
json_path = 'json file PATH'

Since I want to operate openpyxl and json, I also import the json module. Read the excel file with ʻopenpyxl.load_workbook ()and get the sheet withload_book ['sheet name']`.

food_id = sheet.cell(row = 1, column = 2).value

Get the data by specifying the cell of excel by row and column. With this code, the data of 1 row and 2 columns is acquired.

fruits_list.append({
    "food_id": food_id,
    "name": name,
    "dietary_fiber": dietary_fiber,
    "potassium": potassium,
    "iron": iron,
    "vitamin_b1": vitamin_b1,
    "vitamin_c": vitamin_c,
})

Collect the acquired data in a dictionary and add it to the array.

data_dict = {
    "data": "fruits",
    "fruits": fruits_list
}

with open(json_path, mode = 'w', encoding = 'utf-8') as f:
    f.write(json.dumps(data_dict, ensure_ascii = False, indent = 4))

Finally, make the previous data into a dictionary and write it to the json file. mode ='a' in ʻopen ()specifies that the file should be opened in write mode. By the way, ifmode is r, it will be in read mode, and if it is ʻa, it will be in append mode.

Summary

Recommended Posts

Convert Excel data to JSON with python
Convert json to excel
Convert FX 1-minute data to 5-minute data with Python
Read json data with python
How to convert JSON file to CSV file with Python Pandas
Convert list to DataFrame with python
Excel with Python
Convert memo at once with Python 2to3
Convert Hiragana to Romaji with Python (Beta)
Convert array (struct) to json with golang
Convert HEIC files to PNG files with Python
Convert Chinese numerals to Arabic numerals with Python
Sample to convert image to Wavelet with Python
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
Data analysis with python 2
[Python] Use JSON with Python
Convert 202003 to 2020-03 with pandas
Handle Excel with python
Data analysis with Python
Operate Excel with Python (1)
Operate Excel with Python (2)
Convert data with shape (number of data, 1) to (number of data,) with numpy.
Convert PDF to image (JPEG / PNG) with Python
Convert "number" of excel date to python datetime
Just add the python array to the json data
Convert PDFs to images in bulk with Python
[Python] How to read excel file with pandas
Convert svg file to png / ico with Python
Convert Windows epoch values to date with python
I tried to get CloudWatch data with Python
[Introduction to Python] How to handle JSON format data
Write CSV data to AWS-S3 with AWS-Lambda + Python
Batch conversion of Excel files to JSON [Python]
Convert json format data to txt (using yolo)
Convert strings to character-by-character list format with python
Sample data created with python
Connect to BigQuery with Python
How to convert / restore a string with [] in python
I convert AWS JSON data to CSV like this
Graph Excel data with matplotlib (1)
[python] Convert date to string
Convert numpy int64 to python int
[Python] Convert list to Pandas [Pandas]
How to scrape image data from flickr with python
POST json with Python3 script
Get Youtube data with python
Convert csv, tsv data to matrix with python --using MovieLens as an example
Post to slack with Python 3
Reading Note: An Introduction to Data Analysis with Python
Convert the image in .zip to PDF with Python
Graph Excel data with matplotlib (2)
Convert Scratch project to Python
[Python] Convert Shift_JIS to UTF-8
PyInstaller memorandum Convert Python [.py] to [.exe] with 2 lines
Convert / return class object to JSON format in Python
Switch python to 2.7 with alternatives
Write to csv with Python
I tried to analyze J League data with Python
How to convert Json file to CSV format or EXCEL format
Format json with Vim (with python)
Convert python 3.x code to python 2.x