[PYTHON] Convert Pascal VOC format xml file to COCO format json file

Convert Pascal VOC format xml file to COCO format json file

Most of the code is Convert-Pascal-VOC-to-COCO (github) I am diverting it.

For personal reasons, I had to use a COCO format dataset to learn object detection. However, the learning data created by the annotation tool so far cannot be used immediately in the Pascal VOC format xml file. There is little information when converting a Pascal VOC format dataset to a COCO format dataset, isn't it?

Maybe I just haven't been found. ..

Python code

Converts all xml files under the ./annotation/ directory and outputs to one json file (train.json).

The name of the category is racket, player ,,, but please rewrite it according to the data.

XML2JSON.py


import os
import glob
import xml.etree.ElementTree as ET
import xmltodict
import json
from xml.dom import minidom
from collections import OrderedDict

def XML2JSON(xmlFiles):
    attrDict = dict()
    attrDict["categories"]=[{"supercategory":"none","id":1,"name":"racket"},
                    {"supercategory":"none","id":2,"name":"player"},
                    {"supercategory":"none","id":3,"name":"tennisball"},
                    {"supercategory":"none","id":4,"name":"umpire"},
                {"supercategory":"none","id":5,"name":"ballperson"},
                {"supercategory":"none","id":6,"name":"camera"},
                {"supercategory":"none","id":7,"name":"player"},
                {"supercategory":"none","id":8,"name":"tv"},
                {"supercategory":"none","id":9,"name":"smartphone"}
                  ]
    images = list()
    annotations = list()
    image_id = 0
    for file in xmlFiles:    
        image_id = image_id + 1      
        annotation_path=file
        image = dict()
        doc = xmltodict.parse(open(annotation_path).read(), force_list=('object'))
        image['file_name'] = str(doc['annotation']['filename'])
        image['height'] = int(doc['annotation']['size']['height'])
        image['width'] = int(doc['annotation']['size']['width'])
        image['id'] = image_id
        print ("File Name: {} and image_id {}".format(file, image_id))
        images.append(image)
        id1 = 1
        if 'object' in doc['annotation']:
            for obj in doc['annotation']['object']:
                for value in attrDict["categories"]:
                    annotation = dict()          
                    if str(obj['name']) == value["name"]:
                        annotation["iscrowd"] = 0
                        annotation["image_id"] = image_id
                        x1 = int(obj["bndbox"]["xmin"])  - 1
                        y1 = int(obj["bndbox"]["ymin"]) - 1
                        x2 = int(obj["bndbox"]["xmax"]) - x1
                        y2 = int(obj["bndbox"]["ymax"]) - y1                         
                        annotation["bbox"] = [x1, y1, x2, y2]
                        annotation["area"] = float(x2 * y2)
                        annotation["category_id"] = value["id"]
                        annotation["ignore"] = 0
                        annotation["id"] = id1
                        annotation["segmentation"] = [[x1,y1,x1,(y1 + y2), (x1 + x2), (y1 + y2), (x1 + x2), y1]]
                        id1 +=1
                        annotations.append(annotation)

            else:
                print("File: {} doesn't have any object".format(file))

        else:
            print("File: {} not found".format(file))
            

    attrDict["images"] = images    
    attrDict["annotations"] = annotations
    attrDict["type"] = "instances"

    jsonString = json.dumps(attrDict)
    with open("train.json", "w") as f:
        f.write(jsonString)


path="./annotations/"
trainXMLFiles=glob.glob(os.path.join(path, '*.xml'))
XML2JSON(trainXMLFiles)

Recommended Posts

Convert Pascal VOC format xml file to COCO format json file
How to convert Json file to CSV format or EXCEL format
Convert xml format data to txt format data (yolov3)
Convert json format data to txt (using yolo)
Bad post for using "animeface-2009" in Python & Implementation of function to output to PASCAL VOC format XML file
Convert json to excel
How to convert JSON file to CSV file with Python Pandas
Convert / return class object to JSON format in Python
Parse JSON file to object
Convert genbank file to gff file
[Caffe] Convert mean file from binary proto format to npy format
Convert Tweepy Status object to JSON
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
Convert XML document stored in XML database (BaseX) to CSV format (using Python)
[Python] How to convert db file to csv
How to easily convert format from Markdown
Script to generate directory from json file
How to convert Python to an exe file
[Python] Convert csv file delimiters to tab delimiters
Convert (compress) formatted JSON string to 1-line JSON
Convert psd file to png in Python
Convert Excel data to JSON with python
Convert array (struct) to json with golang
Convert Python date types to RFC822 format
How to convert DateTimeField format in Django
convert youtube playlist to local m3u format file for smplayer (by toy tool)
[Tensorflowjs_converter] How to convert Tensorflow model to Tensorflow.js format
How to create a JSON file in Python
Convert svg file to png / ico with Python
[Introduction to Python] How to handle JSON format data
Convert multiple jpg files to one PDF file
Convert binary packages for windows to wheel format
Convert strings to character-by-character list format with python