Let's parse the git commit log in Python!

table of contents

1.First of all 2. subprocess module 3. Create a Commit object 4. Parse the commit log

Introduction

Have you ever wanted to create a module using git in Python? I have. This is an ongoing project.

So, this time, I will write it with the theme of ** Parsing Git commit log with Python **.

subprocess module

In order to use a module that uses git, you first need to ** get the output of the git command **. So let's create a git function that gets the output of the git command using the `` subprocess` module.

git.py


from subprocess import Popen, PIPE

class GitError(BaseException):
    """Occurs when an error occurs in the acquisition result of the git function"""
    pass

def git(cmd, *args):
    """Get and return git execution result"""
    proc = Popen(
        ("git", cmd) + args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    out, err = proc.communicate()
    if len(err) == 0:
        return out[:-1]
    else:
        raise GitError(err)

To briefly explain the contents of the function (line numbers in parentheses), import the Popen class and PIPE variable from the subprocess module (1), and define the git process with the Popen function. (8 ~ 9) Run the process defined by the ** communicate ** method of the Popen class. When executed, ** the output result of standard output and the output result of standard error are returned as tuples **, so get the output result of standard output and the output result of standard error with the out variable and err variable. .. (10) After that, check if there is a standard error, and if so, GitError Raise the class. If not, the standard output result is returned as it is.

It's roughly like this. Next, let's get the commit log using this git function.

Creating a Commit object

Before that, I have something to do. Create a Commit class ** that stores the parsed ** commit logs.

commit.py


class Commit(object):
    """A class that holds the hash value of the commit, the creator, the date and time, the commit comment, and the merged data if merged."""
    def __init__(self, commithash, author, date, comment, merge_data=None):
        self.commithash = commithash
        self.author = self._get_user_obj(author)
        self.date = date
        self.comment = comment
        self.merge_data = merge_data

    def _get_user_obj(self, author_string):
        if " " in author_string:
            lines = author_string.split(" ")
            name = lines[0]
            email = lines[1][1:-1]
            return User(name, email)
        else:
            return User(author_string)


    def __repr__(self):
        return "CommitObject(%s)" % self.commithash

Create a class that inherits from the built-in ʻObject` class.

The data that the Commit object wants to hold is

  1. Commit hash value
  2. Committer
  3. Commit date / time
  4. Commit comment
  5. Merged data (None if not)

These are. So the __init__ method just takes each as an argument and values the Self class.

In the _get_user_obj method, the committer information originally inserted in the Commit object is the data ʻAuthor <[email protected]> ``. So it's your job to split this string into the information ʻAuthor and ʻEmail`. The class that holds these two pieces of information is the ** User ** class.

user.py



class User(object):
    """Keep the name and email address of the committer"""
    def __init__(self, name, email=None):
        self.name = name
        self.email = email

    def __repr__(self):
        return "UserObject(name=&#39;%s&#39;, email=&#39;%s&#39;)" % (
                                    self.name, self.email)
    def __str__(self):
        return self.__repr__()

This class is easy. It's just a class for Author to hold Email.

Parse commit log

Let's start with the code first.

parser.py



from dateutil import parser as dateparser 

def get_commits():
    """Get the commit log from the shell, make it a Commit object and return it as a list"""
    commits = []
    commits_string = git("log")

    # analyzing
    commitlines = commits_string.split("\n\n")
    commits_string_list = []
    while 0 < len(commitlines):
        commits_string_list.append(
                    "\n".join(commitlines[0:2]))
        commitlines = commitlines[2:]

    # parse
    for commitline in commits_string_list:
        commitdata = parse_commit(commitline)
        commits.append(
            Commit(**commitdata))

    return commits


def parse_commit():
    """Parses the commit log obtained from the shell and returns the commit hash value, committer, date / time, commit comment and source and destination if merged"""

    commitlines = commitstring.split("\n")

    # 1 row
    commithash = commitlines[0].replace("commit ","")

    # 2 row
    merge_source = merge_dest = None
    author = None
    if "Merge" in commitlines[1]:
        # parse merge data
        merge_source = commitlines[1].split(" ")[1]
        merge_dest = commitlines[1].split(" ")[2]
    else:
        # parse author
        author = commitlines[1].replace("Author: ","")

    # 3 row
    if "Author" in commitlines[2]:
        # parse author
        author = commitlines[2].replace("Author: ","")
    else:
        # parse date
        date = dateparser.parse(commitlines[2].replace("Date:   ",""))

    # 4 row
    if "Date" in commitlines[3]:
        # parse date
        date = dateparser.parse(commitlines[3].replace("Date:   ",""))
    else:
        # parse comment
        comment = " ".join([i.strip() for i in commitlines[3:]])

    # 5 row
    if "Merge" in commitlines[1]:
        # comment -> under 4 row
        comment = " ".join([i.strip() for i in commitlines[4:]])

    return {"commithash": commithash,
            "merge_data": (merge_source, merge_dest) if merge_source != None else None,
            "author"    : author,
            "date"      : date,
            "comment"   : comment}

commit c92b57f9418eddf75cd96e268bac616aaecd95c4
Merge: 46a71a7 137a6ca
Author: alice <[email protected]>
Date:   Sat Nov 17 01:00:14 2012 +0900

    Merge branch &#39;rewrite&#39;

    Conflicts:
        commithash.py

A normal commit log looks like the one above. (Version with Merge) The above parser is a script that evaluates this commit log line by line and gets the hash value and merge data of the commit, the date / time in the committer, and the commit comment. It's very annoying to explain line by line, so let me omit it ... Well, you can understand it by reading the code. I haven't done anything so advanced.

How you handle this commit data is up to you! !!

Recommended Posts

Let's parse the git commit log in Python!
Parsing Git commit logs in Python
Parse XML in Python
Sort in Python. Next, let's think about the algorithm.
Let's use the open data of "Mamebus" in Python
Let's use def in python
Download the file in Python
Find the difference in Python
Parse User Agent in Python
Wrapping git operations in Python
[Python 2/3] Parse the format string
Let's find pi in Python
One liner to get the nth commit hash in Git
Getting the arXiv API in Python
Python in the browser: Brython's recommendation
Let's run "python -m antigravity" in python
Save the binary file in Python
Hit the Sesami API in Python
Get the desktop path in Python
Get the script path in Python
Let's try Fizz Buzz in Python
In the python command python points to python3.8
Implement the Singleton pattern in Python
Parsing Subversion commit logs in Python
Hit the web API in Python
Let's see using input in python
I wrote the queue in Python
Calculate the previous month in Python
Examine the object's class in python
Get the desktop path in Python
Get the host name in Python
Access the Twitter API in Python
The first step in Python Matplotlib
I wrote the stack in Python
Master the weakref module in Python
How is the progress? Let's get on with the boom ?? in Python
[Python] Let's reduce the number of elements in the result in set operations
Format the Git log and get the committed file name in csv format
Learn the design pattern "Prototype" in Python
Load the remote Python SDK in IntelliJ
Try using the Wunderlist API in Python
[Blender x Python] Let's master the material !!
git / python> git log analysis (v0.1, v0.2)> calculate total working time in minutes from git log
Learn the design pattern "Flyweight" in Python
Django ~ Let's display it in the browser ~
Try using the Kraken API in Python
Learn the design pattern "Observer" in Python
Learn the design pattern "Memento" in Python
Learn the design pattern "Proxy" in Python
Write the test in a python docstring
Learn the design pattern "Command" in Python
OR the List in Python (zip function)
Display Python 3 in the browser with MAMP
Tweet using the Twitter API in Python
Learn the design pattern "Visitor" in Python
Learn the design pattern "Bridge" in Python
Check if the URL exists in Python
Learn the design pattern "Mediator" in Python
Associate the table set in python models.py
Learn the design pattern "Decorator" in Python
Log in to Slack using requests in Python