Introduction

When you want to manage the model, intermediate generation file, accuracy and other results at the same time in a machine learning system, you want to manage not only the code by cutting the branch on Github but also using storage such as S3.

Existing tools

If you want to solve the same problem with existing tools, it seems better to use mlflow.

However, whether you manage the version with mlflow or develop your own management application with flask etc., those who have unified the branch name of git and the object name of S3 I had a feeling that it was good.

There is also a desire to check out the release tag that was cut in the past and reproduce the estimated value at that time.

So I wrote the Python code to get all the current branch and remote release tags.

import subprocess
import pandas as pd


def get_current_branch(repository_dir='./') -> str:
    '''Get the current branch name
    Args:
        repository_dir(str):Directory with repository
    Return:
        str
    '''
    cmd = "cd %s && git rev-parse --abbrev-ref HEAD" % repository_dir
    proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    proc.wait()
    stdout_data = proc.stdout.read()
    # stderr_data = proc.stderr.read()
    current_branch = stdout_data.decode('utf-8').replace('\n','')
    return current_branch


def get_remote_tags(repository='./') -> pd.core.frame.DataFrame:
    '''Get remote tags
    Args:
        repository(str):Directory with repository or URL of repository(Example: https://github.com/mlflow/mlflow )
    Returns:
        pd.core.frame.DataFrame
    Note:
If you want to get a branch instead of a tag,cmd'--tags'To'-h'You can change it to.
    '''
    if repository.startswith('https://github.com/'):
        cmd = "git ls-remote --tags %s" % repository
    else:
        cmd = "cd %s && git ls-remote --tags" % repository
    proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    proc.wait()
    stdout_data = proc.stdout.read()
    # stderr_data = proc.stderr.read()
    if stdout_data:
        tag_df = pd.DataFrame([r.split('\t') for r in stdout_data.decode('utf-8').split('\n')], columns=['hash', 'tag_name'])
        return tag_df.dropna(how='any')
    else:
        print('cannot find tags.')
        return pd.DataFrame(columns=['hash', 'tag_name'])

Recommended Posts

Get git branch name and tag name with python

[Python] Get the variable name with str

Get date with python

Get Gmail subject and body with Python and Gmail API

Get media timeline images and videos with Python + Tweepy

Get comments on youtube Live with [python] and [pytchat]!

Get mail from Gmail and label it with Python3

[Python] Get user information and article information with Qiita API

Get country code with python

Programming with Python and Tkinter

Encryption and decryption with Python