When you want to manage the model, intermediate generation file, accuracy and other results at the same time in a machine learning system, you want to manage not only the code by cutting the branch on Github but also using storage such as S3.
If you want to solve the same problem with existing tools, it seems better to use mlflow.
However, whether you manage the version with mlflow or develop your own management application with flask etc., those who have unified the branch name of git and the object name of S3 I had a feeling that it was good.
There is also a desire to check out the release tag that was cut in the past and reproduce the estimated value at that time.
So I wrote the Python code to get all the current branch and remote release tags.
import subprocess
import pandas as pd
def get_current_branch(repository_dir='./') -> str:
'''Get the current branch name
Args:
repository_dir(str):Directory with repository
Return:
str
'''
cmd = "cd %s && git rev-parse --abbrev-ref HEAD" % repository_dir
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
proc.wait()
stdout_data = proc.stdout.read()
# stderr_data = proc.stderr.read()
current_branch = stdout_data.decode('utf-8').replace('\n','')
return current_branch
def get_remote_tags(repository='./') -> pd.core.frame.DataFrame:
'''Get remote tags
Args:
repository(str):Directory with repository or URL of repository(Example: https://github.com/mlflow/mlflow )
Returns:
pd.core.frame.DataFrame
Note:
If you want to get a branch instead of a tag,cmd'--tags'To'-h'You can change it to.
'''
if repository.startswith('https://github.com/'):
cmd = "git ls-remote --tags %s" % repository
else:
cmd = "cd %s && git ls-remote --tags" % repository
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
proc.wait()
stdout_data = proc.stdout.read()
# stderr_data = proc.stderr.read()
if stdout_data:
tag_df = pd.DataFrame([r.split('\t') for r in stdout_data.decode('utf-8').split('\n')], columns=['hash', 'tag_name'])
return tag_df.dropna(how='any')
else:
print('cannot find tags.')
return pd.DataFrame(columns=['hash', 'tag_name'])
Recommended Posts