Parsing Git commit logs in Python

This article describes how to parse GIT commit logs using Python's Git Python.

As a prerequisite, -Git is installed. -It must be Python 2.7 series. It doesn't seem to be compatible with Python 3.3 at this time (set as a goal)

Development source https://github.com/gitpython-developers/GitPython

document http://pythonhosted.org/GitPython/0.3.2/

Installation

Do the following:

# easy_install GitPython

sample

Enumerate a list of commits for a given repository

The following sample is a script that outputs a list of commit hash IDs of the specified repository and the class name that stores the commit information.

# -*- coding: utf-8 -*-
from git import *

repo = Repo("/share/testgit/searchTwitter")
for item in repo.iter_commits('master', max_count=100):
  print(item.hexsha)
  print(item.__class__)

For Repo, enter the path to your local repository. Unlike centralized management systems like Subversion, GIT holds all the information needed for configuration management locally. All this information is stored in the .git folder of the repository.

When you run this command, the commit information will be [git.objects.commit.Commit](http://pythonhosted.org/GitPython/0.3.2/reference.html#module-git.objects.commit "git.objects." You can see that it is stored in commit.Commit ").

Main properties of Commit

name Description
author The person who originally did the work
authored_date Author date time
author_tz_offset author's timezone offset
committer The person who applied the work
committed_date committer date time
committer_tz_offset committer timezone offset
message Commit message
summary First line of commit message
stats Statistics made from Diff. stats.The information of the updated file is stored in files.
parents Become a parentgit.objects.commit.CommitList of. The first commit is the absence of a parent. You can use this to create a commit order.
tree Tree-structured data that stores blobs.TreeDefined in class

Main structure of Tree

Data that represents a tree structure. Tree Defined in the class.

Since tree.blobs contains all the blobs that belong to the tree, you can extract all the blobs associated with the commit by calling recursively as follows.

def show_tree(tree, indent):
  """
Output Tree information
  """
  print ('%shexsha :%s' % (indent, tree.hexsha))
  print ('%spath :%s' % (indent, tree.path))
  print ('%sabspath :%s' % (indent, tree.abspath))
  print ('%smode :%s' % (indent, tree.mode))
  for t in tree.trees:
    show_tree(t, indent + '  ')

  print ('%s[blobs]' % indent)
  for b in tree.blobs:
    show_blob(b, indent + '  ')

Blobs represent the actual contents of a file and are named after their SHA-1 hash, which is calculated from their size and contents.

Also, the Blob associated with the commit tree is not just the one that has changed, but all the files are associated with it. Those that have not changed are stored with the same hash value as the previous commit, and those that have changed are stored with a different hash value. ..

You can see that Git "preserves directory snapshots".

Final sample

The sample to extract the commits of the repository and extract the Blob of each commit is as follows.

# -*- coding: utf-8 -*-
from git import *
import time

def show_blob(b, indent):
  """
Output Blob information
  """
  print ('%s---------------' %(indent))
  print ('%shexsha:%s' % (indent,b.hexsha))
  print ('%smime_type:%s' % (indent,b.mime_type))
  print ('%spath:%s' %(indent,b.path))
  print ('%sabspath:%s' %(indent,b.abspath))

def show_tree(tree, indent):
  """
Output Tree information
  """
  print ('%shexsha :%s' % (indent, tree.hexsha))
  print ('%spath :%s' % (indent, tree.path))
  print ('%sabspath :%s' % (indent, tree.abspath))
  print ('%smode :%s' % (indent, tree.mode))
  for t in tree.trees:
    show_tree(t, indent + '  ')

  print ('%s[blobs]' % indent)
  for b in tree.blobs:
    show_blob(b, indent + '  ')

def show_commitlog(item):
  """
Output Commit information
  """
  print ("hexsha %s" %item.hexsha)
  print (item.author)
  print (item.author_tz_offset)
  print (time.strftime("%a, %d %b %Y %H:%M", time.gmtime(item.committed_date)))
  print (item.committer)
  print (item.committer_tz_offset)
  print (item.encoding)
  print (item.message)
  print (item.name_rev)
  print (item.summary)
  print ('[stats]')
  print (item.stats.total)
  print (item.stats.files)
  print ('[parents]')
  for i in item.parents:
    print('  %s' % i.hexsha)

  print '[Tree]'
  show_tree(item.tree, '  ')
    

repo = Repo("/share/testgit/searchTwitter")
for item in repo.iter_commits('master', max_count=100):
  print ('================================')
  show_commitlog(item)

Summary

I explained that you can easily analyze the commit log by using GitPython for the locally cloned Git repository.

By using this, it can be expected to create commit statistics for the repository and use it to help project management.

reference

GitPython Documentation https://pythonhosted.org/GitPython/0.3.2/index.html

__ Invisible power __ http://keijinsonyaban.blogspot.jp/2011/05/git.html

Git Book http://git-scm.com/book/ja/

Recommended Posts

Parsing Git commit logs in Python
Parsing Subversion commit logs in Python
Wrapping git operations in Python
CSS parsing with cssutils in Python
Python garbled in Windows + Git Bash environment
Quadtree in Python --2
Python in optimization
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
SendKeys in Python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
Sorted list in Python
Daily AtCoder # 36 in Python
Clustering text in Python
Daily AtCoder # 2 in Python
Implement Enigma in python
Daily AtCoder # 32 in Python
Daily AtCoder # 6 in Python
Daily AtCoder # 18 in Python
Edit fonts in Python
Singleton pattern in Python
File operations in Python
Read DXF in python
Daily AtCoder # 53 in Python
Key input in Python
Use config.ini in Python
Daily AtCoder # 33 in Python
Solve ABC168D in Python