1.First of all 2. subprocess module 3. Create a Commit object 4. Parse the commit log
Have you ever wanted to create a module using git
in Python?
I have. This is an ongoing project.
So, this time, I will write it with the theme of ** Parsing Git commit log with Python **.
In order to use a module that uses git, you first need to ** get the output of the git command **.
So let's create a git function
that gets the output of the git command using the `` subprocess` module.
git.py
from subprocess import Popen, PIPE
class GitError(BaseException):
"""Occurs when an error occurs in the acquisition result of the git function"""
pass
def git(cmd, *args):
"""Get and return git execution result"""
proc = Popen(
("git", cmd) + args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
out, err = proc.communicate()
if len(err) == 0:
return out[:-1]
else:
raise GitError(err)
To briefly explain the contents of the function (line numbers in parentheses), import the Popen
class and PIPE
variable from the subprocess module (1), and define the git
process with the Popen function. (8 ~ 9)
Run the process defined by the ** communicate ** method of the Popen class. When executed, ** the output result of standard output and the output result of standard error are returned as tuples **, so get the output result of standard output and the output result of standard error with the out variable and err variable. .. (10) After that, check if there is a standard error, and if so, GitError
Raise the class. If not, the standard output result is returned as it is.
It's roughly like this. Next, let's get the commit log using this git function.
Before that, I have something to do. Create a Commit class ** that stores the parsed ** commit logs.
commit.py
class Commit(object):
"""A class that holds the hash value of the commit, the creator, the date and time, the commit comment, and the merged data if merged."""
def __init__(self, commithash, author, date, comment, merge_data=None):
self.commithash = commithash
self.author = self._get_user_obj(author)
self.date = date
self.comment = comment
self.merge_data = merge_data
def _get_user_obj(self, author_string):
if " " in author_string:
lines = author_string.split(" ")
name = lines[0]
email = lines[1][1:-1]
return User(name, email)
else:
return User(author_string)
def __repr__(self):
return "CommitObject(%s)" % self.commithash
Create a class that inherits from the built-in ʻObject` class.
The data that the Commit object wants to hold is
These are. So the __init__
method just takes each as an argument and values the Self class.
In the _get_user_obj
method, the committer information originally inserted in the Commit object is the data ʻAuthor <[email protected]> ``. So it's your job to split this string into the information ʻAuthor
and ʻEmail`. The class that holds these two pieces of information is the ** User ** class.
user.py
class User(object):
"""Keep the name and email address of the committer"""
def __init__(self, name, email=None):
self.name = name
self.email = email
def __repr__(self):
return "UserObject(name='%s', email='%s')" % (
self.name, self.email)
def __str__(self):
return self.__repr__()
This class is easy. It's just a class for Author to hold Email.
Let's start with the code first.
parser.py
from dateutil import parser as dateparser
def get_commits():
"""Get the commit log from the shell, make it a Commit object and return it as a list"""
commits = []
commits_string = git("log")
# analyzing
commitlines = commits_string.split("\n\n")
commits_string_list = []
while 0 < len(commitlines):
commits_string_list.append(
"\n".join(commitlines[0:2]))
commitlines = commitlines[2:]
# parse
for commitline in commits_string_list:
commitdata = parse_commit(commitline)
commits.append(
Commit(**commitdata))
return commits
def parse_commit():
"""Parses the commit log obtained from the shell and returns the commit hash value, committer, date / time, commit comment and source and destination if merged"""
commitlines = commitstring.split("\n")
# 1 row
commithash = commitlines[0].replace("commit ","")
# 2 row
merge_source = merge_dest = None
author = None
if "Merge" in commitlines[1]:
# parse merge data
merge_source = commitlines[1].split(" ")[1]
merge_dest = commitlines[1].split(" ")[2]
else:
# parse author
author = commitlines[1].replace("Author: ","")
# 3 row
if "Author" in commitlines[2]:
# parse author
author = commitlines[2].replace("Author: ","")
else:
# parse date
date = dateparser.parse(commitlines[2].replace("Date: ",""))
# 4 row
if "Date" in commitlines[3]:
# parse date
date = dateparser.parse(commitlines[3].replace("Date: ",""))
else:
# parse comment
comment = " ".join([i.strip() for i in commitlines[3:]])
# 5 row
if "Merge" in commitlines[1]:
# comment -> under 4 row
comment = " ".join([i.strip() for i in commitlines[4:]])
return {"commithash": commithash,
"merge_data": (merge_source, merge_dest) if merge_source != None else None,
"author" : author,
"date" : date,
"comment" : comment}
commit c92b57f9418eddf75cd96e268bac616aaecd95c4
Merge: 46a71a7 137a6ca
Author: alice <[email protected]>
Date: Sat Nov 17 01:00:14 2012 +0900
Merge branch 'rewrite'
Conflicts:
commithash.py
A normal commit log looks like the one above. (Version with Merge) The above parser is a script that evaluates this commit log line by line and gets the hash value and merge data of the commit, the date / time in the committer, and the commit comment. It's very annoying to explain line by line, so let me omit it ... Well, you can understand it by reading the code. I haven't done anything so advanced.
How you handle this commit data is up to you! !!
Recommended Posts