Parsing Subversion commit logs in Python

This article describes how to parse Subversion commit logs in Python using python-svn.

I'm doing it here in 2.7, but Python 3.3, 3.2 and 2.7 libraries are provided.

http://pysvn.tigris.org/

Installation

Install the binary using one of the following methods

__ Unix-like __

sudo apt-get install python-svn

Windows、MACOS Download from below http://pysvn.tigris.org/project_downloads.html

Sample and description of log acquisition

The sample below shows the history of a specified Subversion repository.

svnlog.py


import pysvn
import time
import sys
from collections import defaultdict


class SvnRepController:
  def __init__(self,url_or_path,user,passwd):
    self.client = pysvn.Client()
    self.url_or_path = url_or_path
    self.user = user
    self.passwd = passwd
    self.client.callback_get_login = self.get_login

  def get_login(self, realm, username, may_save):
    return True, self.user,self.passwd, False

  def log(self):
    return self.client.log(self.url_or_path, discover_changed_paths=True)

if __name__ == '__main__':
  argvs = sys.argv
  argc = len(argvs)
  if(argc != 4):
    sys.stderr.write( 'Usage:\n python %s url_or_path user pass' % argvs[0] )
    quit()
  url_or_path = argvs[1]
  user = argvs[2]
  passwd = argvs[3]
  client = SvnRepController(url_or_path,user,passwd)

  logs = client.log()
  for log in logs:
    print ("RevNo:%d" % (log.revision.number))
    print ("Author:%s date:%s" % (log.author, time.ctime(log.date)))
    print (log.message)
    for p in log.changed_paths:
      print ("  %s" % dict(p))

It can be executed as follows.

python svnlog.py http://svn.sourceforge.jp/svnroot/simyukkuri/ "user name" "password"

The procedure will be explained.

(1) Generate pysvn.Client ().

(2) Specify the authentication function in client.callback_get_login. This function is called back when authentication is required. Authentication information must be specified in the return value.

(3) You can get the array of PySvnLog by performing client.log ().

(4) PySvnLog is PysvnDictBase It inherits "PysvnDictBase"). PysvnDictBase can be operated like Directory. In other words, you can get the required properties by doing the following.

  logs = client.log()
  for log in logs:
    dict(log)

The explanation of the main items obtained here is shown below.

__PySvnLog content: __

name Description
revision.number Revision number
author author
date Commit time.It is expressed numerically.
message Commit message
changed_paths List of PysvnLogChangedPath

Contents of __PysvnLogChangedPath: __

name Description
action Represents the type of operation
http://svnbook.red-bean.com/en/1.7/svn.ref.svn.c.update.html
path The path of the operation
copyfrom_path Copy source path
copyfrom_revision Copy source revision

(5) After that, perform the necessary aggregation.

application

An application example is shown below.

It is also possible to analyze the patterns of files that are frequently updated at the same time from the file update history. In the following example, the degree of influence that occurs when a file with a simple appearance frequency is modified is analyzed.

import pysvn
import time
import sys
from collections import defaultdict

class FilePair:
  def __init__(self,path1,path2):
    self.path1 = path1
    self.path2 = path2
    self.count = 0
    self.reliability1 =0
    self.reliability2 =0


class SvnRepController:
  def __init__(self,url_or_path,user,passwd):
    self.client = pysvn.Client()
    self.url_or_path = url_or_path
    self.user = user
    self.passwd = passwd
    self.client.callback_get_login = self.get_login

  def get_login(self, realm, username, may_save):
    return True, self.user,self.passwd, False

  def log(self):
    return self.client.log(self.url_or_path, discover_changed_paths=True)
  
  def getSimpleLogicalCoupling(self):
    files = defaultdict(int)
    ret = defaultdict(FilePair)
    logs = self.log()
    for log in logs:
      for i in range(0,len(log.changed_paths)-1):
        path1 = log.changed_paths[i]
        if path1.action != "M":
          #Ignore except for changes
          continue

        files[path1.path] += 1

        for j in range(i+1,len(log.changed_paths)-1):
          path2 = log.changed_paths[j]
          if path2.action != "M":
            #Ignore except for changes
            continue
          if path1.path == path2.path:
            continue

          key = "%s %s" % (path1.path , path2.path)
          if( ret.has_key(key) == False ):
            key = "%s %s" % (path2.path , path1.path)
            if( ret.has_key(key) == False ):
              ret[key] = FilePair(path1.path,path2.path)
          ret[key].count += 1

    for k,v in ret.items():
      v.reliability1 = float(v.count) / files[v.path1]
      v.reliability2 = float(v.count) / files[v.path2]
      
    return ret



if __name__ == '__main__':
  argvs = sys.argv
  argc = len(argvs)
  if(argc != 6):
    sys.stderr.write( 'Usage:\n python %s url_or_path user pass min_count min_reliability' % argvs[0] )
    quit()
  url_or_path = argvs[1]
  user = argvs[2]
  passwd = argvs[3]
  min_count = argvs[4]
  min_reliablility = float(argvs[5])
  client = SvnRepController(url_or_path,user,passwd)
  list = client.getSimpleLogicalCoupling()
  print '"Path A","Path B","Count","Count/count of A","Count/count of B","reliability"'
  for k,v in sorted(list.items(), key=lambda x:x[1].count, reverse=True):
    if (v.reliability1 > float(min_reliablility) or v.reliability2 > float(min_reliablility)) and v.count > int(min_count):
      print '"%s","%s","%d","%f","%f","%f"' % (v.path1,v.path2,v.count,v.reliability1,v.reliability2,v.reliability1 if v.reliability1>v.reliability2 else v.reliability2)

If you log in to the SampleProject repository with User: admin and password admin and the frequency of occurrence is 5 or more and you change the corresponding file, at least 80% of the time you will get the combination of updated files. Can be done as follows.

python svnSimpleLogicalCoupling.py http://127.0.0.1/svn/SampleProject admin admin 5 0.8 > out.csv

In addition, it seems that the following things can be done. -Analyze the commit log to see what corrections are there. -Extract the change history associated with the ticket numbers of Redmine and Trac, and find the bug occurrence rate for each file. ・ Frequency of commits for each person

As mentioned above, it can be expected that various information can be obtained by analyzing the change history.

Recommended Posts

Parsing Subversion commit logs in Python
Parsing Git commit logs in Python
CSS parsing with cssutils in Python
Quadtree in Python --2
Python in optimization
CURL in python
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
Let's parse the git commit log in Python!
Sorted list in Python
Daily AtCoder # 36 in Python
Clustering text in Python
Daily AtCoder # 2 in Python
Daily AtCoder # 32 in Python
Daily AtCoder # 6 in Python
Daily AtCoder # 18 in Python
Edit fonts in Python
Singleton pattern in Python
Read DXF in python
Daily AtCoder # 53 in Python
Use config.ini in Python
Daily AtCoder # 33 in Python
Solve ABC168D in Python
Logistic distribution in Python
Daily AtCoder # 7 in Python