[PYTHON] Folder structure for analysis

Introduction

I think there are people who analyze for thesis and master's thesis, and people who analyze it on a regular basis. This is an article for such people. I will show you the folder structure that has been optimized for over 3 years, so I would appreciate it if you could refer to it or give me feedback that it is better to do this.

Click here for 2 years ago: The strongest folder structure I think in analysis --Qiita

environment

For the time being, I am using Windows10, 64bit, git bash, matlab 2015b, but the content of the article has nothing to do with it.

Folder structure

The following is a compilation of the tree.rb. The obvious thing to do is to separate folders by role to prevent files from cluttering.

I think that both python and matlab can be done like this (I usually use matlab), but what about R language? I will write an article when I understand it.

If there is an extension, it indicates a file, and if it has a /, it indicates a folder.

tree.rb


$ tree.rb
project-root
|-- README.md
|-- .gitignore
|-- data/
|   |-- data1/ #data save folder 1
|   `-- data2/ #data save folder 2
|-- experiments/
|   |-- category1/
|   `-- category2/
|       |-- experiment1.m
|       `-- experiment2.m
|-- libs/
|   |-- +common/
|   |   |-- FigureGenerator.m
|   |   `-- Utility.m
|   |-- +la/
|   |   |-- hessian.m
|   |   `-- jacobian.m
|   |-- tests/
|   |   `-- runtests.m # tests/unittest/*Test.Put a function that executes m and returns the test result. test-driven-Development becomes possible.
|   `-- yaml/
|       `-- YAMLMatlab_0.4.3/ #Very useful external library. res/Read the yaml file below
|-- output/
|   |-- docs/
|   |   |-- others/
|   |   |-- paper/
|   |   |   |-- yyyy-mm-dd-first-paper/
|   |   |   `-- yyyy-mm-dd-second-paper/
|   |   `-- thesis/
|   |       |-- bachelor/
|   |       `-- master/
|   |-- refs/
|   `-- slides/
|       |-- conference/
|       |   |-- yyyy-mm-dd-conference1/
|       |   |-- yyyy-mm-dd-conference2/
|       |   `-- yyyy-mm-dd-conference3/
|       |-- defense/
|       |   |-- bachelar-midterm/
|       |   |-- bachelar-final/
|       |   |-- master-midterm/
|       |   `-- master-final/
|       |-- discussion/
|       |   |-- yyyy-mm-dd-discussion.pptx
|       |   `-- yyyy-mm-dd-discussion.pptx
|       `-- others/
|-- res/ #yaml is libs/yaml/*It is converted to a matlab variable by the following YamlMatlab. Place other programming related files
|   |-- const.yaml
|   `-- litrconst.yaml
|-- results/ #results to correspond to the folder / file names under experiments/Create folders etc. below
|   |-- category1/
|   `-- category2/
|       |-- experiment1/
|       `-- experiment2/
|           |-- data1/
|           `-- data2/
|-- scripts/ #A folder that stores common processes that are called from all script files.
`-- tests/
    |-- implementation/ #Playground folder for implementation
    `-- unittest/ #The folder where the test suite of unittest is stored.
        |-- HelperTest.m # helper class' test class
        `-- LATest.m # linear algebra functions' test class

Folder description

experiments, scripts, results, libs, data, res, tests are the folders related to programming. The roles are shown in the table below.

folder description
experiments Put a lot of verification scripts
scripts Common script
libs Put a lot of function files
data Keep the original data used for analysis
results A folder to save the analysis results.
It is good to have a folder structure that corresponds to experiments
res Put things that you use for programming but not source code, such as yaml files and other config files
tests I have a lot of test classes

I don't put a program, but I separate important folders for research as output.

folder description
output Research output. I wonder if it can be roughly divided into slide, docs, and refs.
Place presentation slides on slides, and dissertations, graduation thesis, and master's thesis on docs.

Please do not use Japanese for the folder name ** absolutely **. I think he is a person who cannot program.

Conclusion

Since the above file structure is all, I will not write a detailed explanation. Please read. If you have any questions, please leave a comment.

2017/11/19 postscript

It seems that it has been reread a little recently, so I will add it. I think the following Python template generator is also effective. Please use it. CookieCutter for DataScience in Python

I haven't tried it yet so I can't comment at all, but it might look like rails, django in data science.

Recommended Posts

Folder structure for analysis
Source analysis for Django--INSTALLED_APPS
Python for Data Analysis Chapter 4
Python for Data Analysis Chapter 2
Tips for data analysis ・ Notes
Python for Data Analysis Chapter 3
3D skeleton structure analysis with Python
Notes for challenging basketball video analysis
Preprocessing template for data analysis (Python)
Data analysis for improving POG 3-Regression analysis-
Python visualization tool for data analysis work
Network analysis is a web link structure ①
Program for Twitter Trend Analysis (Personal Note)
I summarized the folder structure of Flask
Network analysis is a web link structure ②
JupyterLab Basic Setting 2 (pip) for data analysis
JupyterLab Basic Setup for Data Analysis (pip)
Analysis for Data Scientists: Qiita Self-Article Summary 2020