Extract lines that match the conditions from a text file with python

Overview

I made a process in python to extract text using any of prefix match, suffix match, partial match, and exact match using multiple conditions. Originally, I made a process in python to extract and remove something that contains a specific wording from a certain text, but I thought that the process to extract alone would be effective and recreated it so that I could partially change that part. I tried it.

Things necessary

This time there is also an exe, so if you just want to run it, you don't need python.

Publication place

Processing content

Source description

The following process creates conditions for searching.

    def createReg(self):
        searchItems=pd.read_excel('resources/Search data.xlsx')
        sortTypeCode=iniFile.get('info','sortType')

        searchItemArray=np.asarray(searchItems['Search word'])
        sortType=SORT_ENUM(sortTypeCode)
        if sortType==SORT_ENUM.SORT_LENGTH_ASC or sortType==SORT_ENUM.SORT_LENGTH_DESC:
            searchItemIndex=[]
            for item in searchItemArray:
                searchItemIndex.append(len(item))
            searchSeries=pd.Series(searchItemIndex)
            serchItemDataFrame=pd.concat([searchItems['Search word'],searchSeries],axis=1)
            if sortType==SORT_ENUM.SORT_LENGTH_ASC:
                sortItems=serchItemDataFrame.sort_values(0,ascending=True)
            else:
                sortItems=serchItemDataFrame.sort_values(0,ascending=False)
            searchItemArray=np.asarray(sortItems['Search word'])
        regTypeCode=iniFile.get('info','regType')
        regType=REG_ENUM(regTypeCode)
        regStr=''
        for item in searchItemArray:
            if regStr!='':
                regStr=regStr+'|'
            sItem=item
            if REG_ENUM.REG_TYPE_CONTAIN==regType:
                sItem='.*'+item+'.*'
            elif REG_ENUM.REG_TYPE_FRONT==regType:
                sItem=item+'.*'
            elif REG_ENUM.REG_TYPE_BACKWARD==regType:
                sItem='*.'+item
            elif REG_ENUM.REG_TYPE_EXACT_MATCH==regType:
                sItem=item
            regStr=regStr+sItem
        return re.compile(regStr)

The following process extracts based on the conditions created in the above process.

    def extract(self):
        reg=self.createReg()
        paths=glob.glob('data/*.csv')
        
        fileDict={}

        for pathName in paths:
            extractList=[]
            with open(pathName,encoding=iniFile.get('info','encoding')) as f:
                # targetStrs=f.read()
                for targetStr in f:
                    extractStr=reg.search(targetStr)
                    if extractStr:
                        extractList.append(targetStr)
            fileDict[os.path.basename(pathName)]=extractList
        outputPath=iniFile.get('info','outputPath')
        for key,data in fileDict.items():
            outputFile=outputPath+'extract_'+key+'.txt'
            with open(outputFile,encoding='utf-8',mode='w') as f:
                for d in data:
                    f.write(d)

How to use

How to use

Recommended Posts

Extract lines that match the conditions from a text file with python
Extract the xz file with python
Read a file in Python with a relative path from the program
Wav file generation from numeric text with python
Read line by line from a file with Python
Extract data from a web page with Python
A python script that draws a band diagram from the VASP output file EIGENVAL
A memo that I touched the Datastore with python
Extract template of EML file saved from Thunderbird with python3.7
A program that removes specific characters from the entered text
Access the file with a relative path from the execution script.
Extract the value closest to a value from a Python list element
Python: Extract file information from shared drive with Google Drive API
Extract text from images in Python
Extract the targz file using python
Try to extract a character string from an image with Python3
From a book that programmers can learn (Python): Find the mode
Outputs a line containing the specified character string from a text file
[python] Extract text from pdf and read characters aloud with Open-Jtalk
The idea of feeding the config file with a python file instead of yaml
Get OCTA simulation conditions from a file and save with pandas
The story of making a module that skips mail with python
Creating a simple PowerPoint file with Python
Check the existence of the file with python
Python3 + pyperclip that rewrites the copied text
# 5 [python3] Extract characters from a character string
Search the maze with the python A * algorithm
Let's read the RINEX file with Python ①
Extract Japanese text from PDF with PDFMiner
Create a large text file with shellscript
Create a deb file from a python package
A typed world that begins with Python
[Python] A program that rounds the score
I made a configuration file with Python
A memo organized by renaming the file names in the folder with python
Tips: [Python] Randomly restore and extract an array from a fasta file
Tips: [Python] Extract only lines that do not contain a specific string
Get the average salary of a job with specified conditions from indeed.com
From a book that makes the programmer's way of thinking interesting (Python)
[Implementation example] Read the file line by line with Cython (Python) from the last line
How to read a CSV file with Python 2/3
[Python] Get the files in a folder with Python
Speaking Japanese with OpenJtalk (reading a text file)
I want to display only different lines of a text file with diff
Read a file containing garbled lines in Python
Speaking Japanese with gTTS (reading a text file)
Download the image from the text file containing the URL
[Automation] Extract the table in PDF with Python
Python --Read data from a numeric data file and find the multiple regression line.
Create a page that loads infinitely with python
Around the authentication of PyDrive2, a package that operates Google Drive with Python
Create a python3 build environment with Sublime Text3
"Python Kit" that calls a Python script from Swift
Replace the named entity in the read text file with a label (using GiNZA)
Save the object to a file with pickle
Draw a graph with matplotlib from a csv file
Create a decision tree from 0 with Python (1. Overview)
Run a Python file from html using Django
Extract zip with Python (Japanese file name support)
Create a Photoshop format file (.psd) with python
Draw a line / scatter plot on the CSV file (2 columns) with python matplotlib