Try to get the contents of Word with Golang

Introduction

I usually develop a little in-house tool in Golang The other day, I received such a consultation from a junior "Every week, I just have to paste a document written in Word into Excel and put it together ..."

** "Moreover, .docx and .doc are mixed ..." **

Oh...

When I looked it up, I found a nice package, so I tried using it

code.sajari.com/docconv

A program that simply reads a Word file and outputs it to the console

package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"

	"code.sajari.com/docconv"
)

//WordContent Preserves the content retrieved from a Word file
type WordContent struct {
	body string
}

// String fmt.Println()Called when outputting with
func (wc *WordContent) String() string {
	return strings.TrimSpace(wc.body)
}

//FileRead Reads the file in the path specified by filename and returns its contents
func FileRead(filename string) (*WordContent, error) {
	f, err := os.Open(filename)
	if err != nil {
		return nil, fmt.Errorf("fail to open file: %v", err)
	}
	defer f.Close()

	switch filepath.Ext(filename) {
	case ".docx":
		content, _, err := docconv.ConvertDocx(f)
		wc := WordContent{content}
		return &wc, err
	case ".doc":
		content, _, err := docconv.ConvertDoc(f)
		wc := WordContent{content}
		return &wc, err
	}
	return nil, nil
}

func main() {
	filename1 := "samples/sample.docx"
	wc, err := FileRead(filename1)
	if err != nil {
		log.Fatalln(err)
	}
	fmt.Println(wc)

	fmt.Println("----------------")

	filename2 := "samples/sample.doc"
	wc, err = FileRead(filename2)
	if err != nil {
		log.Fatalln(err)
	}
	fmt.Println(wc)
}

The point is that the method to read the file differs depending on the extension.

I hope it works

Recommended Posts

Try to get the contents of Word with Golang
Try to get the function list of Python> os package
Settings to debug the contents of the library with VS Code
Try to automate the operation of network devices with Python
Get the source of the page to load infinitely with python.
Try to extract the features of the sensor data with CNN
How to get the ID of Type2Tag NXP NTAG213 with nfcpy
Try to solve the N Queens problem with SA of PyQUBO
Output the contents of ~ .xlsx in the folder to HTML with Python
Create a function to get the contents of the database in Go
PhytoMine-I tried to get the genetic information of plants with Python
Try to image the elevation data of the Geographical Survey Institute with Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
I tried to get the authentication code of Qiita API with Python.
Try to get the road surface condition using big data of road surface management
Setting to debug test by entering the contents of the library with pytest
Try to react only the carbon at the end of the chain with SMARTS
I tried to get the movie information of TMDb API with Python
Try to separate the background and moving object of the video with OpenCV
The easiest way to get started with Django
Try to solve the man-machine chart with Python
How to try the friends-of-friends algorithm with pyfof
Dump the contents of redis db with lua
To get the path of the currently running python.exe
Try to simulate the movement of the solar system
Note: How to get the last day of the month with python (added the first day of the month)
[Verification] Try to align the point cloud with the optimization function of pytorch Part 1
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
Try to solve the programming challenge book with python3
[First API] Try to get Qiita articles with Python
Template of python script to read the contents of the file
Add information to the bottom of the figure with Matplotlib
Try to solve the problems / problems of "Matrix Programmer" (Chapter 1)
Try to solve the internship assignment problem with Python
Try to estimate the number of likes on Twitter
Get to know the feelings of gradient boosting trees
[Neo4J] ④ Try to handle the graph structure with Cypher
Get the operation status of JR West with Python
Script to get the expiration date of the SSL certificate
Try to specify the axis with PyTorch's Softmax function
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Get information equivalent to the Network tab of Chrome developer tools with Python + Selenium
I tried to put out the frequent word ranking of LINE talk with Python
How to get started with Visual Studio Online ~ The end of the environment construction era ~
[For beginners] Web scraping with Python "Access the URL in the page to get the contents"
It's Christmas, so I'll try to draw the genealogy of Jesus Christ with Cabocha
Get the number of digits
Simulation of the contents of the wallet
Try scraping the data of COVID-19 in Tokyo with Python
[Ubuntu] How to delete the entire contents of a directory
I tried to get the location information of Odakyu Bus
Try to evaluate the performance of machine learning / regression model
Try to play with the uprobe that supports Systemtap directly
I tried to find the average of the sequence with TensorFlow
Minimum knowledge to get started with the Python logging module
Get the package version to register with PyPI from Git
How to enable Read / Write of net.Conn with context with golang
Try to evaluate the performance of machine learning / classification model
I want to get the operation information of yahoo route
Try to improve the accuracy of Twitter like number estimation