Extract only lines that do not contain a specific string.
In such cases, python's in
is convenient.
Whether or not a string A contains a string B can be determined by the following syntax.
[String B] in [String A]
The return value is a logical value.
Use in
to extract only the lines that do not contain a particular string.
For example, suppose you want to get only the lines that do not contain a specific word from the following sentence.
example.txt
There is an apple on the desk.
There is an apple on the desk.
There is an apple on the table.
There is a banana on the desk.
There is a banana on the desk.
There is a banana on the table.
The list of words you want to exclude is as follows.
filter.txt
Apple
apple
banana
The script in such a case is as follows.
fitrHavingLine.py
#!/usr/local/bin/python3
# -*- coding: utf-8 -*-
"""
Output a line that does not contain data on the reference file.
"""
__author__ = "Kazuki Nakamae <[email protected]>"
__version__ = "0.00"
__date__ = "2 Jun 2017"
import sys
def fitrHavingLine(infn,reffn,outfn):
"""
@function fitrHavingLine();
Output a line that does not contain data on the reference file.
@param {string} infn :Input file
@param {string} refdir :Reference file
@param {string} outfn :Output file
"""
inf = open(infn, 'r')
for infline in inf:
isNothing = True
ref = open(reffn, 'r')
#If the character string on the reference file exists, set the judgment to False.
for refline in ref:
if refline.strip() in infline:
isNothing=False
break
ref.close()
if isNothing:
outf = open(outfn, 'a')
outf.write(infline)
outf.close()
inf.close()
if __name__ == '__main__':
argvs = sys.argv
argc = len(argvs)
if (argc != 4): #Argument check
print("USAGE : python3 fitrHavingLine.py <INPUT FILE> <REFERENCE FILE> <OUTPUT FILE>")
quit()
fitrHavingLine(argvs[1],argvs[2],argvs[3])
quit()
Enter the following in bash:
python3 fitrHavingLine.py example.txt filter.txt out.txt
The output is as follows.
out.txt
There is an apple on the desk
There is a banana on the desk
There is a banana on the table.
if refline.strip() in infline:
isNothing=False
break
On the contrary, if you change the part of, it will be possible to process the line containing a specific character or only a specific line.
It may be useful when handling data with various items such as csv files.
that's all. Thank you very much.
Recommended Posts