[LINUX] What to do if you can't find well with grep's -f option

Introduction

If you are using grep -f and it happens that only **" the line at the end of the search string "is hit *** The line feed code may be the cause. → The content is that if you correct the line feed code with sed or ʻecho`, it will work.

(* For Windows. Maybe it behaves differently for Mac?)


-Grep -f option

grep -f [search string file] [search target file]

・ Use while statement

while read line do grep "$ line" [Search target file] done <[search string file]

(* In this writing method, it is necessary to break the line at the last line of ** [Search string file] ** (otherwise only the last line will not be read). Lines that meet multiple conditions will appear many times. Please note that it will be done)


What's happening

As shown below, when searching for each line in the file, the phenomenon that only "the line with the search string at the end" is selected may occur (or rather, it happened).

Search string file (file.txt)


AAA
BBB

Search target file (test.txt)


AAAxxxxxxxx
xxxxxAAAxxx
xxxxxxxxAAA
BBBx BBB xx
xxxxxxxxBBB
xxAAAxxBBBx
xxxCCCxxxxx

Only the line with the last search string is selected


$ grep -f file.txt test.txt
xxxxxxxxAAA
xxxxxxxxBBB

$ while read line
> do
> grep "$line" test.txt
> done < file.txt
xxxxxxxxAAA
xxxxxxxxBBB

Example of workaround

1. sed command

Anything is fine, so if you use sed to recreate ** [search string file] **, it will work fine.

$ sed 's/^//' file.txt > file2.txt
$ grep -f file2.txt test.txt
AAAxxxxxxxx
xxxxxAAAxxx
xxxxxxxxAAA
BBBx BBB xx
xxxxxxxxBBB
xxAAAxxBBBx

2. echo command

For the while statement, rereading $ line with ʻecho` also works.

$ while read line
> do
> grep `echo $line` test.txt
> done < file.txt
AAAxxxxxxxx
xxxxxAAAxxx
xxxxxxxxAAA
xxAAAxxBBBx  ##
BBBx BBB xx
xxxxxxxxBBB
xxAAAxxBBBx  ##Appears many times if multiple conditions are met

Other remedies

Examples of other line feed code conversion methods ・ Conversion of line feed code

Commentary

The cause of this behavior was that the line feed code was different between Windows and Unix. In other words, if you use a file created on Windows as a search string, the ** \ r ** part will interfere with the search (search for "** search string + \ r **"), so search at the end. It only hit if there was a string. (In fact, if you remove the line breaks on the second line of ** file.txt **, "BBB" will be searched normally, and conversely, ** test.txt ** will be sed. If you change the line feed code from CRLF to LF by executing, nothing will be output.)

OS Line feed code 「od -How it looks in "c"
Unix LF \n
Mac(OSX) LF \n
Mac(OS9) CR \r
Windows CR+LF \r\n

Quote: Check line feed code

file.When there is no line break in the second line (BBB) of txt


$ grep -f file.txt test.txt
xxxxxxxxAAA
BBBx BBB xx
xxxxxxxxBBB
xxAAAxxBBBx

Search target file (test.When sed is executed in txt) (nothing is output)


$ sed 's/^//' test.txt > test2.txt
$ grep -f file.txt test2.txt


Using sed or ʻecho` will convert the line feed code from ** CRLF (\ r \ n) ** to ** LF (\ n) ** so that the search will work. Become.

Before and after the sed command


## ----------------------Before sed(CRLF)
$ file file.txt
file.txt: ASCII text, with CRLF line terminators

$ od -c file.txt
0000000   A   A   A  \r  \n   B   B   B  \r  \n
0000012

## ----------------------After sed(LF)
$ file file2.txt
file2.txt: ASCII text

$ od -c file2.txt
0000000   A   A   A  \n   B   B   B  \n
0000010

Before and after the echo command


$ cat hoge.txt
hoge

$ while read line
> do
> echo `echo $line` > hoge2.txt
> done < hoge.txt

## ----------------------before echo(CRLF)
$ od -c hoge.txt
0000000   h   o   g   e  \r  \n
0000006

## ----------------------After echo(LF)
$ od -c hoge2.txt
0000000   h   o   g   e  \n
0000005

Reference: ・ [Sed] Command (Basic) -Edit text file -How to replace the changed line feed code with Windows sed and return it from LF to CRLF

In the case of Mac, CR was adopted in the previous MacOS, but after MacOSX it is said that it is the same LF as Unix-like OS.

[Linux] Convert line breaks

Recommended Posts

What to do if you can't find well with grep's -f option
What to do if you can't sort files with subscripts
What to do if you can't install pyaudio with pip #Python
What to do if you can't build your project with Maven
What to do if you can't pipenv shell
What to do if you can't find PDO in Laravel or CakePHP
What to do if you can't install with pip in babun environment
What to do if you can't pip install mysqlclient
What to do if you can't log in as root
What to do if you can't use WiFi on Linux
What to do if you get a UnicodeDecodeError with pip install
What to do if you can't use the trash in Lubuntu 18.04.
What to do if you couldn't send an email to Yahoo with Python.
What to do if you can't use scikit grid search in Python
What to do if you get lost in file reference with FileNotFoundError
What to do if you get a TypeError with numpy min, max
What to do if you get Could not fetch URL 443 with pip
What to do if you get angry with swapon failed: Operation not permitted
What to do if you get an error when installing python with pyenv
No module named What to do if you get'libs.resources'
ModuleNotFoundError: No module What to do if you get'tensorflow.contrib'
Links to do what you want with Sublime Text
What you can't do with hstack or vstack with dstack
What to do when you can't bind CaboCha to Python
What to do if you get an OpenSSL error when installing Python 2 with pyenv
How to install and use pyenv, what to do if you can't switch python versions
What to do if you get an Import Error when importing matplotlib with Jupyter
What to do if you can't hit the arrow keys in the Python interactive console
What to do if you run python in IntelliJ and end with an error
[AWS] What to do when you want to pip with Lambda
What to do if you are addicted to Windows character code
What to do if you get "coverage unknown" in Coveralls
What to do if you lose your EC2 key pair
What to do if yum breaks
What to do with Magics install
What to do with PYTHON release?
Python | What you can do with Python
What to do if you get an Undefined error when trying to use pip with pyenv
What to do if you get a minus zero in Python
What to do if intellisense doesn't work with Anaconda + VSCode + Tensorflow2.1
What to do when you get "I can't see the site !!!!"
What you can do with API vol.1
What you can do with programming skills
Let's summarize what you want to do.
What to do if ipython and python start up with different versions
What to do if you forget your login password on Manjaro Linux
What to do if you get angry in TensorFlow v2 without attribute'app'
What to do if you get stuck during Anaconda installation on Linux
What to do if the Chainer (Windows) sample mnist terminates with WinError 183.
What to do if pyenv install does not proceed with an error
What to do if the server doesn't start with python manage.py runserver
What to do if an error occurs when importing numpy with VScode
What to do if you get an error when trying to load mnist
What to do if you get an error when installing Dlib (Ubuntu)
What to do if you get a "Wrong Python Platform" warning when using Python with the NetBeans IDE
What to do if you get angry with "Value Error: unknown local: UTF-8" in python manage.py syncdb
What to do if pipreqs results in UnicodeDecodeError
Note: What to do if pip install fails
What to do if mod_fcgid cannot resolve UnicodeEncodeError
What to do if rails s doesn't work
What to do if scrapy doesn't work after installing scrapy with pip on mac