Checks if there is a specific character string for all files under the directory that is Python and outputs the target line

Overview

[DIR_NAME] For all files below Check if it is a text file with the character code defined in [TARGET_ENCODING_LIST], If it is a text file, search for [SEARCH_WORD] and The result is output to the file name of [OUTPUT_NAME].

environment

Windows8 + Python2.6 series

code

find_directory.py


#!/usr/bin/python
# -*- coding: utf-8 -*-
# vim: fileencoding=utf-8

import os , sys , codecs

DIR_NAME = 'C:\\html\\HOGE\\'
OUTPUT_NAME = 'result_find_file_list.csv'

SEARCH_WORD = '<font'

TARGET_ENCODINGS = [
	'utf-8',
	'shift-jis',
	'euc-jp',
	'iso2022-jp'
]

FLAG_STDOUT = True
#FLAG_STDOUT = False

import os, sys

write = sys.stdout.write

def guess_charset(data):
	file = lambda d, encoding: d.decode(encoding) and encoding
	for enc in TARGET_ENCODINGS:
		try:
			file(data, enc)
			return enc
		except:
			pass
	return 'binary'

out = codecs.open(OUTPUT_NAME, 'w', 'shift-jis')
out.write('path,line_number,search,target_line\n')

for dirpath, dirs, files in os.walk(DIR_NAME):
	for fn in files:
		path = os.path.join(dirpath, fn)
		fobj = file(path, 'rU')
		data = fobj.read()
		fobj.close()
		try:
			enc = guess_charset(data)
		except:
			continue
		if enc == 'binary':
			continue
		count = 0
		try:
			for l in codecs.open(path, 'r', enc):
				count = count + 1
				if SEARCH_WORD in l:
					output = ''
					try:
						output = '"' + path + '","' + str(count) + '","' + SEARCH_WORD + '","' + l.replace('"',"'").replace('\r','').replace('\n','') + '"\r\n'
					except:
						continue
					if FLAG_STDOUT == True:
						write(output)
					out.write(output)
		except:
			continue

Supplement

As usual, exception handling is appropriate. There is room for refactoring, but I want to put it in the actual battle tomorrow, so I will post it as it is

Recommended Posts

Checks if there is a specific character string for all files under the directory that is Python and outputs the target line
Check what the character code is for all files under the directory that is Python and output
[Golang] Check if a specific character string is included in the character string
For Windows: Get a list of directories and files under a specific directory.
Check if the string is a number in python
[Python] Leave only the elements that start with a specific character string in the array
[Python] A function that searches the entire string with a regular expression and retrieves all matching strings.
Python will fail if there is a space after the backslash
Delete a particular character in Python if it is the last
Outputs a line containing the specified character string from a text file
Check if there is a specific symbol in the executable file and its dependent libraries (simplified version)
How to input a character string in Python and output it as it is or in the opposite direction.
List all files under the current directory line by line with full path
If branch depending on whether there is a specific element in the list
Is there a contradiction between the party that protects the people from NHK and the party that protects NHK from the people?
Search for files with line feed code CR + LF under the current directory
Python script that makes UTF-8 files with all BOMs under the folder without BOMs
Delete all pyc files under the specified directory
Unzip all zip files under the current directory
Automate background removal for the latest portraits in a directory with Python and API
[Python] Rename all image files in a specific folder by shooting date and time
[Ansible] Example of playbook that adds a character string to the first line of the file
Python> Implementation that synthesizes 3 files with variable columns and outputs 3 items on each line
[Linux] When you want to search for a specific character string from multiple files
There is a pattern that the program did not stop when using Python threading
A note that runs an external program in Python and parses the resulting line
# Function that returns the character code of a string
[Python] Programming to find the number of a in a character string that repeats a specified number of times.