This article describes how to parse GIT commit logs using Python's Git Python.
As a prerequisite, -Git is installed. -It must be Python 2.7 series. It doesn't seem to be compatible with Python 3.3 at this time (set as a goal)
Development source https://github.com/gitpython-developers/GitPython
document http://pythonhosted.org/GitPython/0.3.2/
Do the following:
# easy_install GitPython
The following sample is a script that outputs a list of commit hash IDs of the specified repository and the class name that stores the commit information.
# -*- coding: utf-8 -*-
from git import *
repo = Repo("/share/testgit/searchTwitter")
for item in repo.iter_commits('master', max_count=100):
print(item.hexsha)
print(item.__class__)
For Repo, enter the path to your local repository. Unlike centralized management systems like Subversion, GIT holds all the information needed for configuration management locally. All this information is stored in the .git folder of the repository.
When you run this command, the commit information will be [git.objects.commit.Commit](http://pythonhosted.org/GitPython/0.3.2/reference.html#module-git.objects.commit "git.objects." You can see that it is stored in commit.Commit ").
name | Description |
---|---|
author | The person who originally did the work |
authored_date | Author date time |
author_tz_offset | author's timezone offset |
committer | The person who applied the work |
committed_date | committer date time |
committer_tz_offset | committer timezone offset |
message | Commit message |
summary | First line of commit message |
stats | Statistics made from Diff. stats.The information of the updated file is stored in files. |
parents | Become a parentgit.objects.commit.CommitList of. The first commit is the absence of a parent. You can use this to create a commit order. |
tree | Tree-structured data that stores blobs.TreeDefined in class |
Data that represents a tree structure. Tree Defined in the class.
Since tree.blobs contains all the blobs that belong to the tree, you can extract all the blobs associated with the commit by calling recursively as follows.
def show_tree(tree, indent):
"""
Output Tree information
"""
print ('%shexsha :%s' % (indent, tree.hexsha))
print ('%spath :%s' % (indent, tree.path))
print ('%sabspath :%s' % (indent, tree.abspath))
print ('%smode :%s' % (indent, tree.mode))
for t in tree.trees:
show_tree(t, indent + ' ')
print ('%s[blobs]' % indent)
for b in tree.blobs:
show_blob(b, indent + ' ')
Blobs represent the actual contents of a file and are named after their SHA-1 hash, which is calculated from their size and contents.
Also, the Blob associated with the commit tree is not just the one that has changed, but all the files are associated with it. Those that have not changed are stored with the same hash value as the previous commit, and those that have changed are stored with a different hash value. ..
You can see that Git "preserves directory snapshots".
The sample to extract the commits of the repository and extract the Blob of each commit is as follows.
# -*- coding: utf-8 -*-
from git import *
import time
def show_blob(b, indent):
"""
Output Blob information
"""
print ('%s---------------' %(indent))
print ('%shexsha:%s' % (indent,b.hexsha))
print ('%smime_type:%s' % (indent,b.mime_type))
print ('%spath:%s' %(indent,b.path))
print ('%sabspath:%s' %(indent,b.abspath))
def show_tree(tree, indent):
"""
Output Tree information
"""
print ('%shexsha :%s' % (indent, tree.hexsha))
print ('%spath :%s' % (indent, tree.path))
print ('%sabspath :%s' % (indent, tree.abspath))
print ('%smode :%s' % (indent, tree.mode))
for t in tree.trees:
show_tree(t, indent + ' ')
print ('%s[blobs]' % indent)
for b in tree.blobs:
show_blob(b, indent + ' ')
def show_commitlog(item):
"""
Output Commit information
"""
print ("hexsha %s" %item.hexsha)
print (item.author)
print (item.author_tz_offset)
print (time.strftime("%a, %d %b %Y %H:%M", time.gmtime(item.committed_date)))
print (item.committer)
print (item.committer_tz_offset)
print (item.encoding)
print (item.message)
print (item.name_rev)
print (item.summary)
print ('[stats]')
print (item.stats.total)
print (item.stats.files)
print ('[parents]')
for i in item.parents:
print(' %s' % i.hexsha)
print '[Tree]'
show_tree(item.tree, ' ')
repo = Repo("/share/testgit/searchTwitter")
for item in repo.iter_commits('master', max_count=100):
print ('================================')
show_commitlog(item)
I explained that you can easily analyze the commit log by using GitPython for the locally cloned Git repository.
By using this, it can be expected to create commit statistics for the repository and use it to help project management.
GitPython Documentation https://pythonhosted.org/GitPython/0.3.2/index.html
__ Invisible power __ http://keijinsonyaban.blogspot.jp/2011/05/git.html
Git Book http://git-scm.com/book/ja/
Recommended Posts