[PYTHON] Convert xml format data to txt format data (yolov3)

Preface

When I try to detect an object with tensorflow using yolo, the data is often in xml format and cannot be applied to yolo. Then let's make it ourselves.

Code actually created and used

import xml.etree.ElementTree as ET
import sys , os
import glob

cate_list = ["Car","Pedestrian","Truck","Signal","Signs","Bicycle","Motorbike","Bus","SVehicle","Train"]

with open("voc_classes.txt","w") as f:
    f.write('\n'.join(cate_list))
    
def class_encord(class_name):
    cate_id = {"Car":0,"Pedestrian":1,"Truck":2,"Signal":3,"Signs":4,"Bicycle":5,"Motorbike":6,"Bus":7,"SVehicle":8,"Train":9}
    return cate_id[class_name]

def convert(data_file,list_file):
    in_file = open(data_file)
    tree = ET.parse(in_file)
    root = tree.getroot()
    for obj in root.iter("item"):
        cate = obj.find("category").text
        cate = cate.lstrip("\n").lstrip("   ")
        cate = cate.rstrip("    ").rstrip("\n")
        category_id = class_encord(cate)
        xmlbox = obj.find("box2d")
        data = [int(float(xmlbox.find("x1").text)),int(float(xmlbox.find("x2").text)),int(float(xmlbox.find("y1").text)),int(float(xmlbox.find("y2").text))]
        list_file.write(" " + ",".join([str(a) for a in data]) + "," + str(category_id))

data_file_list = glob.glob("Annotations/*.xml")

list_file = open("2007_train.txt","w")
for data_file in data_file_list:
    jpg_file = "train_" + data_file.rstrip(".xml") + ".jpg "
    list_file.write(jpg_file)
    convert(data_file,list_file)
    list_file.write("\n")
list_file.close()

It's not generalized, so it's very hard to see (; ^ ω ^) I will list the parameters that can be used by changing it

Code flow

yolo uses the txt that describes the object, so output it.

cate_list = ["Car","Pedestrian","Truck","Signal","Signs","Bicycle","Motorbike","Bus","SVehicle","Train"]

with open("voc_classes.txt","w") as f:
    f.write('\n'.join(cate_list))

A function for encoding the object name and id.

def class_encord(class_name):
    cate_id = {"Car":0,"Pedestrian":1,"Truck":2,"Signal":3,"Signs":4,"Bicycle":5,"Motorbike":6,"Bus":7,"SVehicle":8,"Train":9}
    return cate_id[class_name]

A function that converts xml data to a txt file. What you are doing is as simple as reading with tml.etree.ElementTree, fetching and writing each data.

def convert(data_file,list_file):
    in_file = open(data_file)
    tree = ET.parse(in_file)
    root = tree.getroot()
    for obj in root.iter("item"):
        cate = obj.find("category").text
        cate = cate.lstrip("\n").lstrip("   ")
        cate = cate.rstrip("    ").rstrip("\n")
        category_id = class_encord(cate)
        xmlbox = obj.find("box2d")
        data = [int(float(xmlbox.find("x1").text)),int(float(xmlbox.find("x2").text)),int(float(xmlbox.find("y1").text)),int(float(xmlbox.find("y2").text))]
        list_file.write(" " + ",".join([str(a) for a in data]) + "," + str(category_id))

All that's left is to do it.

data_file_list = glob.glob("Annotations/*.xml")

list_file = open("2007_train.txt","w")
for data_file in data_file_list:
    jpg_file = "train_" + data_file.rstrip(".xml") + ".jpg "
    list_file.write(jpg_file)
    convert(data_file,list_file)
    list_file.write("\n")
list_file.close()

Object detection is interesting (^ ▽ ^) Data organization is difficult.

Recommended Posts

Convert xml format data to txt format data (yolov3)
Convert json format data to txt (using yolo)
Convert Pascal VOC format xml file to COCO format json file
Convert from pdf to txt 2 [pyocr]
How to easily convert format from Markdown
Convert matplotlib graphs to emf file format
Convert Excel data to JSON with python
[MNIST] Convert data to PNG for keras
Use pandas to convert grid data to row-holding (?) Data
Convert FX 1-minute data to 5-minute data with Python
Convert PDF attached to email to text format
Convert Python date types to RFC822 format
How to convert DateTimeField format in Django
Convert XML document stored in XML database (BaseX) to CSV format (using Python)
Convert data with shape (number of data, 1) to (number of data,) with numpy.
Convert to HSV
Convert Mobile Suica usage history PDF to pandas Data Frame format with tabula-py
[Introduction to Python] How to handle JSON format data
Convert binary packages for windows to wheel format
Convert strings to character-by-character list format with python
I convert AWS JSON data to CSV like this
How to convert horizontally held data to vertically held data with pandas
Convert Qiita articles to Jekyll post format for backup
Convert / return class object to JSON format in Python
How to convert Json file to CSV format or EXCEL format
Convert translation resource files (.po) to XLIFF format (.xlf)
Try writing JSON format data to object storage Cloudian/S3
Convert 202003 to 2020-03 with pandas
Convert kanji to kana
Convert jupyter to py
Convert keras-yolo3 to onnx
Convert dict to array
Convert json to excel
Convert mesh data exported from SpriteUV2 to a format that can be imported by Spine
Python / datetime> Implementation to convert YYYYMMDD format to YYYY / MM / DD
Linux script to convert Markdown files from JupyterLab format to Qiita format
Extract classification information etc. from genbank data in xml format
Convert the image data (png) at hand to a .pbm image
[Caffe] Convert mean file from binary proto format to npy format
Convert GRIB2 format weather data that cannot be opened with pygrib to netCDF and visualize it