This article describes how to parse Subversion commit logs in Python using python-svn.
I'm doing it here in 2.7, but Python 3.3, 3.2 and 2.7 libraries are provided.
http://pysvn.tigris.org/
Install the binary using one of the following methods
__ Unix-like __
sudo apt-get install python-svn
Windows、MACOS Download from below http://pysvn.tigris.org/project_downloads.html
The sample below shows the history of a specified Subversion repository.
svnlog.py
import pysvn
import time
import sys
from collections import defaultdict
class SvnRepController:
def __init__(self,url_or_path,user,passwd):
self.client = pysvn.Client()
self.url_or_path = url_or_path
self.user = user
self.passwd = passwd
self.client.callback_get_login = self.get_login
def get_login(self, realm, username, may_save):
return True, self.user,self.passwd, False
def log(self):
return self.client.log(self.url_or_path, discover_changed_paths=True)
if __name__ == '__main__':
argvs = sys.argv
argc = len(argvs)
if(argc != 4):
sys.stderr.write( 'Usage:\n python %s url_or_path user pass' % argvs[0] )
quit()
url_or_path = argvs[1]
user = argvs[2]
passwd = argvs[3]
client = SvnRepController(url_or_path,user,passwd)
logs = client.log()
for log in logs:
print ("RevNo:%d" % (log.revision.number))
print ("Author:%s date:%s" % (log.author, time.ctime(log.date)))
print (log.message)
for p in log.changed_paths:
print (" %s" % dict(p))
It can be executed as follows.
python svnlog.py http://svn.sourceforge.jp/svnroot/simyukkuri/ "user name" "password"
The procedure will be explained.
(1) Generate pysvn.Client ().
(2) Specify the authentication function in client.callback_get_login. This function is called back when authentication is required. Authentication information must be specified in the return value.
(3) You can get the array of PySvnLog by performing client.log ().
(4) PySvnLog is PysvnDictBase It inherits "PysvnDictBase"). PysvnDictBase can be operated like Directory. In other words, you can get the required properties by doing the following.
logs = client.log()
for log in logs:
dict(log)
The explanation of the main items obtained here is shown below.
__PySvnLog content: __
name | Description |
---|---|
revision.number | Revision number |
author | author |
date | Commit time.It is expressed numerically. |
message | Commit message |
changed_paths | List of PysvnLogChangedPath |
Contents of __PysvnLogChangedPath: __
name | Description |
---|---|
action | Represents the type of operation http://svnbook.red-bean.com/en/1.7/svn.ref.svn.c.update.html |
path | The path of the operation |
copyfrom_path | Copy source path |
copyfrom_revision | Copy source revision |
(5) After that, perform the necessary aggregation.
An application example is shown below.
It is also possible to analyze the patterns of files that are frequently updated at the same time from the file update history. In the following example, the degree of influence that occurs when a file with a simple appearance frequency is modified is analyzed.
import pysvn
import time
import sys
from collections import defaultdict
class FilePair:
def __init__(self,path1,path2):
self.path1 = path1
self.path2 = path2
self.count = 0
self.reliability1 =0
self.reliability2 =0
class SvnRepController:
def __init__(self,url_or_path,user,passwd):
self.client = pysvn.Client()
self.url_or_path = url_or_path
self.user = user
self.passwd = passwd
self.client.callback_get_login = self.get_login
def get_login(self, realm, username, may_save):
return True, self.user,self.passwd, False
def log(self):
return self.client.log(self.url_or_path, discover_changed_paths=True)
def getSimpleLogicalCoupling(self):
files = defaultdict(int)
ret = defaultdict(FilePair)
logs = self.log()
for log in logs:
for i in range(0,len(log.changed_paths)-1):
path1 = log.changed_paths[i]
if path1.action != "M":
#Ignore except for changes
continue
files[path1.path] += 1
for j in range(i+1,len(log.changed_paths)-1):
path2 = log.changed_paths[j]
if path2.action != "M":
#Ignore except for changes
continue
if path1.path == path2.path:
continue
key = "%s %s" % (path1.path , path2.path)
if( ret.has_key(key) == False ):
key = "%s %s" % (path2.path , path1.path)
if( ret.has_key(key) == False ):
ret[key] = FilePair(path1.path,path2.path)
ret[key].count += 1
for k,v in ret.items():
v.reliability1 = float(v.count) / files[v.path1]
v.reliability2 = float(v.count) / files[v.path2]
return ret
if __name__ == '__main__':
argvs = sys.argv
argc = len(argvs)
if(argc != 6):
sys.stderr.write( 'Usage:\n python %s url_or_path user pass min_count min_reliability' % argvs[0] )
quit()
url_or_path = argvs[1]
user = argvs[2]
passwd = argvs[3]
min_count = argvs[4]
min_reliablility = float(argvs[5])
client = SvnRepController(url_or_path,user,passwd)
list = client.getSimpleLogicalCoupling()
print '"Path A","Path B","Count","Count/count of A","Count/count of B","reliability"'
for k,v in sorted(list.items(), key=lambda x:x[1].count, reverse=True):
if (v.reliability1 > float(min_reliablility) or v.reliability2 > float(min_reliablility)) and v.count > int(min_count):
print '"%s","%s","%d","%f","%f","%f"' % (v.path1,v.path2,v.count,v.reliability1,v.reliability2,v.reliability1 if v.reliability1>v.reliability2 else v.reliability2)
If you log in to the SampleProject repository with User: admin and password admin and the frequency of occurrence is 5 or more and you change the corresponding file, at least 80% of the time you will get the combination of updated files. Can be done as follows.
python svnSimpleLogicalCoupling.py http://127.0.0.1/svn/SampleProject admin admin 5 0.8 > out.csv
In addition, it seems that the following things can be done. -Analyze the commit log to see what corrections are there. -Extract the change history associated with the ticket numbers of Redmine and Trac, and find the bug occurrence rate for each file. ・ Frequency of commits for each person
As mentioned above, it can be expected that various information can be obtained by analyzing the change history.
Recommended Posts