I want to randomly sample a file in Python

things to do

  1. Create 100 files
  2. Get the file path with the specified extension
  3. Randomly select 10 and copy to another directory

Directory structure

.
├── input
│   └── _sampled
└── src
    └── random_sampling.py

code

random_smpling.py


from pprint import pprint
import os
from pathlib import Path
import random
import shutil


class FileControler(object):
    def make_file(self, output_dir, file_name, content='') -> None:
        """File creation"""
        #Create if there is no output directory
        os.makedirs(output_dir, exist_ok=True)
        #Combine output directory and file name
        file_path = os.path.join(output_dir, file_name)
        #File creation
        with open(file_path, mode='w') as f:
            #Create an empty file by default
            f.write(content)

    def get_files_path(self, input_dir, pattern):
        """Get file path"""
        #Create a path object by specifying a directory
        path_obj = Path(input_dir)
        #Match files in glob format
        files_path = path_obj.glob(pattern)
        #Posix conversion to treat as a character string
        files_path_posix = [file_path.as_posix() for file_path in files_path]
        return files_path_posix

    def random_sampling(self, files_path, sampl_num, output_dir, fix_seed=True) -> None:
        """Random sampling"""
        #If you sample the same file every time, fix the seed value
        if fix_seed is True:
            random.seed(0)
        #Specify the file group path and the number of samples
        files_path_sampled = random.sample(files_path, sampl_num)
        #Create if there is no output directory
        os.makedirs(output_dir, exist_ok=True)
        #copy
        for file_path in files_path_sampled:
            shutil.copy(file_path, output_dir)

Run

Instance creation

file_controler = FileControler()

directory

Create 100 files in ʻinput / Copy the sampled file into ʻinput / _sampled

all_files_dir = '../input'
sampled_dir = '../input/_sampled'

First, make 100 files

50 files each of .py and .txt

for i in range(1, 51):
    file_controler.make_file(all_files_dir, f'file{i}.py')
    file_controler.make_file(all_files_dir, f'file{i}.txt')

Get only the path of the .py file

from pprint import pprint


pattern = '*.py'
files_path = file_controler.get_files_path(all_files_dir, pattern)

pprint(files_path)
# ['../input/file8.py',
#  '../input/file28.py',
#  '../input/file38.py',
#  .
#  .
#  .
#  '../input/file25.py',
#  '../input/file35.py',
#  '../input/file50.py']
#

print(len(files_path))
# 50

Random sampling for 10 files

sample_num = 10
file_controler.random_sampling(files_path, sample_num, sampled_dir)

Verification

Terminal


ls input/_sampled

file10.py file23.py file3.py  file35.py file36.py file37.py file38.py file4.py  file41.py file43.py

Recommended Posts

I want to randomly sample a file in Python
I want to create a window in Python
I want to write to a file with Python
I want to embed a variable in a Python string
I want to easily implement a timeout in python
I want to work with a robot in python.
I want to make input () a nice complement in python
I want to print in a comprehension
I want to build a Python environment
I want to do Dunnett's test in Python
How to create a JSON file in Python
I want to make a game with Python
I want to merge nested dicts in Python
Sample to put Python Kivy in one file
I want to display the progress in Python!
I want to convert a table converted to PDF in Python back to CSV
I want to color a part of an Excel string in Python
I made a program to check the size of a file in Python
I want to do a monkey patch only partially safely in Python
I want to write in Python! (1) Code format check
Parse a JSON string written to a file in Python
I want to generate a UUID quickly (memorandum) ~ Python ~
I want to transition with a button in flask
Even in JavaScript, I want to see Python `range ()`!
I tried to implement a pseudo pachislot in Python
A memorandum to run a python script in a bat file
[Python] I want to make a nested list a tuple
I want to write in Python! (3) Utilize the mock
I want to use the R dataset in python
I want to run a quantum computer with Python
I want to do something in Python when I finish
I want to manipulate strings in Kotlin like Python!
Python program is slow! I want to speed up! In such a case ...
I want to get the file name, line number, and function name in Python 3.4
Create a binary file in Python
I want to debug with Python
I tried to implement a one-dimensional cellular automaton in Python
I want to do something like sort uniq in Python
Change the standard output destination to a file in Python
[Python] I want to get a common set between numpy
I want to start a lot of processes from python
How to import a file anywhere you like in Python
I tried "How to get a method decorated in Python"
I want to send a message from Python to LINE Bot
I tried to make a stopwatch using tkinter in python
I want to be able to run Python in VS Code
I want to use a python data source in Re: Dash to get query results
I want to save a file with "Do not compress images in file" set in OpenPyXL
I want to replace the variables in the python template file and mass-produce it in another file.
I want to write a triple loop and conditional branch in one line in python
I tried to implement PLSA in Python
I want to use a wildcard that I want to shell with Python remove
I tried to implement permutation in Python
I made a script in Python to convert a text file for JSON (for vscode user snippet)
I made a payroll program in Python!
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
Convert psd file to png in Python
Sample script to trap signals in Python
I tried to implement PLSA in Python 2
I want to use jar from python
I want to do a full text search with elasticsearch + python