Organize data divided by folder with Python

In 3 lines

Get a list of folder names

Standard version (OK in standard environment)

import os
dir = [d for d in os.listdir(".") if os.path.isdir(d)]

A cooler way (regular expressions can be used)


import glob
dir = glob.glob(os.path.join("*",""))


dir = glob.glob("*/")

Regular expression usage example

Example of searching for folders case01, case02, ...

dir = glob.glob(os.path.join("case*",""))

If you want to get only a text file (.txt).

dir = glob.glob("*.txt")

Execute the processing program for each folder

import shutil
import subprocess

for f in dir:
    # copy files from local folder to target folder
    for fi in cp_files:
    # remove files at target folder    
    for fi in rm_files:
        if os.path.exists(os.path.join(f,fi)):

Process text data organized by folder with pandas

The data is in tab format (.tsv), and the index column and data column are assumed from the left. スクリーンショット 2019-11-23 13.42.09.png

Data reading may be handled by try: because the above processing program may fail. The error folder needs to be output. It is convenient to prepare the index by processing from the folder name later.

import pandas as pd


for f in dir:
    # case01\\ => case01
    index_name = os.path.split(f)[0]
    # Error handle
        # Data structure {col.0 : index, col.1 : Data}
        df = pd.read_csv(os.path.join(f,"out.tsv"),sep='\t',header=None,index_col=0)
        print("Error in {0}".foramt(index_name))
# make index
dfs.index = df.index

Let's check the data. (Why is there a "0" line, but I don't care because it will disappear later)

スクリーンショット 2019-11-23 15.12.25.png

Something done with pandas

First, it's easier to handle if you swap the rows and columns.

dfsT = dfs.T
スクリーンショット 2019-11-23 15.12.47.png

First, processing of missing data (NaN).

dfsT = dfsT.dropna()

Appropriately from here.

For example, use a fancy index to process conditional data. (Here, an example in which the WSA / L2 column outputs data of 0.2 or more)

dfsT_select = dfsT[dfsT["WSA/L2"] > 0.2]

Visualization with matplotlib

import matplotlib.pyplot as plt,dfsT["WSA/L2"], \
スクリーンショット 2019-11-23 14.58.15.png

Adjustment of horizontal axis

fig, ax = plt.subplots(),dfsT["WSA/L2"], \
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45, fontsize=10);
スクリーンショット 2019-11-23 14.58.24.png

Utilization in Excel (output)

Many people ask me to use Excel for the data, so I'll give it to you.

スクリーンショット 2019-11-23 15.15.33.png

If the text format is acceptable, for example:


