Tips: [Python] Extract only lines that do not contain a specific string

Purpose

Extract only lines that do not contain a specific string.

point

In such cases, python's in is convenient.

Whether or not a string A contains a string B can be determined by the following syntax.

[String B] in [String A]

The return value is a logical value.

Illustration

Use in to extract only the lines that do not contain a particular string.

For example, suppose you want to get only the lines that do not contain a specific word from the following sentence.

example.txt


There is an apple on the desk.
There is an apple on the desk.
There is an apple on the table.
There is a banana on the desk.
There is a banana on the desk.
There is a banana on the table.

The list of words you want to exclude is as follows.

filter.txt


Apple
apple
banana

The script in such a case is as follows.

fitrHavingLine.py


#!/usr/local/bin/python3
# -*- coding: utf-8 -*-

"""
Output a line that does not contain data on the reference file.
"""
__author__  = "Kazuki Nakamae <[email protected]>"
__version__ = "0.00"
__date__    = "2 Jun 2017"

import sys

def fitrHavingLine(infn,reffn,outfn):
    """
    @function   fitrHavingLine();
Output a line that does not contain data on the reference file.
    @param  {string} infn :Input file
    @param  {string} refdir :Reference file
    @param  {string} outfn :Output file
    """

    inf = open(infn, 'r')
    for infline in inf:
        isNothing = True
        ref = open(reffn, 'r')
        #If the character string on the reference file exists, set the judgment to False.
        for refline in ref:
            if refline.strip() in infline:
                isNothing=False
                break
        ref.close()
        if isNothing:
            outf    =   open(outfn, 'a')
            outf.write(infline)
            outf.close()
    inf.close()


if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)

    if (argc != 4):   #Argument check
        print("USAGE : python3 fitrHavingLine.py <INPUT FILE> <REFERENCE FILE> <OUTPUT FILE>")
        quit()

    fitrHavingLine(argvs[1],argvs[2],argvs[3])
quit()

Enter the following in bash:

python3 fitrHavingLine.py example.txt filter.txt out.txt

The output is as follows.

out.txt


There is an apple on the desk
There is a banana on the desk
There is a banana on the table.

in conclusion

if refline.strip() in infline:
                isNothing=False
                break

On the contrary, if you change the part of, it will be possible to process the line containing a specific character or only a specific line.

It may be useful when handling data with various items such as csv files.

that's all. Thank you very much.

Recommended Posts

Tips: [Python] Extract only lines that do not contain a specific string
Extract lines containing a specific "string" in Pandas
Extract only elements that meet specific conditions in Python
[Python] Leave only the elements that start with a specific character string in the array
[Python] Extracts data frames that do not match a specific column with other data frames of Pandas
A Python one-liner that pretty prints json. Do not escape Japanese.
# 5 [python3] Extract characters from a character string
Extract lines that match the conditions from a text file with python
A set of script files that do wordcloud in Python3
[Python] Use a string sequence
Python list is not a list
Extract only Python for preprocessing
Notes that I gradually understand because I do not understand Python self well
[Python] How to force a method of a subclass to do something specific
A record that GAMEBOY could not be done in Python. (PYBOY)
Try to extract a character string from an image with Python3
A special Python codec that seems to know but does not know