[PYTHON] Use the progress bar with Click: (Easy to use, improve the display of tasks that take more than 24 hours, notes when using in parallel processing)

Click cute Click

Once you know the convenience of the Click Library for creating console applications in Python, it's argparse or optparse, so you don't even know what it means to parse arguments. right.

For Click, Official Site or an introductory article by ikuyamada "Create a Python console application easily with Click" As you see.

I will write about the progress bar of Click.

By displaying the progress of work that takes a little time with a bar, it reduces the stress of waiting time for the user.

Wrapping iterable with click.progressbar () will give you items one by one and at the same time will update the progress bar nicely.

sample

As a simple example, I wrote an app that just counts down with the OS X say command. It supports four languages, English, German, French, and Japanese. (Voice for each language must be installed) // If you are not Macker, please edit the say function as appropriate.

countdown.py


#!/usr/bin/env python
#
# countdown
#
import click
import time
import os

_lang = 'en'

def say(num):
    if _lang == 'ja':
        voice = 'Kyoko'
    elif _lang == 'fr':
        voice = 'Virginie'
    elif _lang == 'de':
        voice = 'Anna'
    else:
        voice = 'Agnes'

    os.system('say --voice %s %d' % (voice, num))


@click.command()
@click.argument('count', type=int)
@click.option('--lang', default='en')
def countdown(count, lang):
    global _lang
    _lang = lang

    numbers = range(count, -1, -1)
    with click.progressbar(numbers) as bar:
        for num in bar:
            say(num)
            time.sleep(1)


if __name__ == '__main__':
    countdown()

I will move it.

$ python countdown.py 10 --lang=ja

As the countdown from 10 starts, the progress bar will be displayed like this.

[##################------------------]   50%  00:00:06

The remaining time is estimated from the progress up to that point and is displayed at the right end.

click.progressbar () options

click.progressbar(iterable=None, length=None, label=None, show_eta=True, show_percent=None, show_pos=False, item_show_func=None, fill_char='#', empty_char='-', bar_template='%(label)s [%(bar)s] %(info)s', info_sep=' ', width=36, file=None, color=None)

There is an explanation in the original document http://click.pocoo.org/5/api/#click.progressbar, but for the time being.

--iterable: Iterable to be iterable. If None, length specification is required. --length: The number of items to iterate. If iterable is passed, it will ask for the length by default (although the length may not be known). If the length parameter is specified, iterates by the specified length (not the iterable length). --label: The label to display next to the progress bar. --show_eta: Whether to display the estimated time required. If the length cannot be determined, it is automatically hidden. --show_percent: Whether to show percentages. (Default is True if iterable has length, False otherwise) --show_pos: Show absolute position (default is False) --item_show_func: Function called when displaying the item being processed. This function returns the string that should be displayed in the progress bar for the item being processed. Note that the item may be None. --fill_char: Character used for the filled part of the progress bar. --empty_char: Character used for the unfilled part of the progress bar. --bar_template: Format string used as a bar template. There are three parameters used in it: label, bar, and info. --info_sep: Separator for multiple information items (eta, etc.). --width: The width of the progress bar (in units of half-width characters). If 0 is specified, the full width of the terminal --file – The file to write to. Only labels are displayed except for terminals. --color: Does the terminal support ANSI colors? (Automatic identification)

~~ Pitfall: I don't know how many days it will take if the remaining time is 24 hours or more ~~

  • 2015/11/8 This is a problem with the current version (5.1) at the time of writing. I sent a pull request to the head family, so it may have been resolved by the time you read the article. → [Pull request was merged] on the night of November 10, 2015 (https://github.com/mitsuhiko/click/pull/453#event-460027353). The current master has resolved this issue. However, I think that some people (version <= 5.1) cannot use the master version for the time being, so I will leave the following countermeasures.

It's a very useful feature, but it has one pitfall.

As a test, let's carry out a countdown from 1 million with the previous sample.

$ python countdown.py 1000000
[------------------------------------]    0%  00:20:47

20 minutes to go to 999999? Every time I count one, I put a 1 second sleep, so it should take 11 days and 13 hours 46 minutes at the earliest.

What do you mean?

If the remaining time is 24 hours or more, the display will return from 00:00:00. In other words, only the remainder of the remaining time divided by 24 hours is displayed.

Let's take a look at the source of click.

In the ProgressBar class (defined in click / _termui_impl.py) The time required (eta) is formatted by a member function called format_eta ().

    @property
    def eta(self):
        if self.length_known and not self.finished:
            return self.time_per_iteration * (self.length - self.pos)
        return 0.0

    def format_eta(self):
        if self.eta_known:
            return time.strftime('%H:%M:%S', time.gmtime(self.eta + 1))
        return ''

It is implemented like this. It's sad that only tasks that can be completed within 24 hours are expected.

Should I change '% H:% M:% S' here to '% d% H:% M:% S'? It's not.

$ python
Python 2.7.9 (default, Jan  7 2015, 11:50:42) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(0))
'1970-01-01 00:00:00'
>>> time.strftime('%H:%M:%S', time.gmtime(0))
'00:00:00'
>>> time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(60*60*24))
'1970-01-02 00:00:00'
>>> time.strftime('%H:%M:%S', time.gmtime(60*60*24))
'00:00:00'
```

 `` time.gmtime (t) `` is a structure consisting of 9 items: year, month, day, hour, minute, second, day of the week, day of the year, and daylight saving time. I'm returning `time.struct_time``
 Since it is based on unix epoch (January 1, 1970 0:00:00), `` time.gmtime (0) `` is "January 1, 1970" 0:00:00 In seconds, throw this into `` time.stftime () `` and take ``% Y-% m-% d`` to get "1970-01-01".
 (The implementation of `` format_eta () `` works with ``'% H:% M:% S'`` because unix epoch isn't halfway at 00:00:00. )

 So, if you consider the daily unit, it seems better to calculate by yourself without relying on `` time.gmtime`` or `` time.strftime``.

 For example:

```
def format_eta_impl(self):
    if self.eta_known:
        t = self.eta
        seconds = t % 60
        t /= 60
        minutes = t % 60
        t /= 60
        hours = t % 24
        t /= 24
        if t > 0:
            days = t
            return '%dd %02d:%02d:%02d' % (days, hours, minutes, seconds)
        else:
            return '%02d:%02d:%02d' % (hours, minutes, seconds)
    else:
        return ''
```

 I would like to replace the member function `` format_eta (self) `` of the ProgressBar class with this, but since the ProgressBar class itself is defined in click / _termui_impl.py and is invisible to library users.

```
    with click.progressbar(numbers) as bar:
        bar.__class__.format_eta = format_eta_impl
        for num in bar:
            say(num)
            time.sleep(1)
```

 It looks like it should be done like this.
 This doesn't work with `` bar.format_eta = format_eta_impl``.
 (I will omit the explanation of that area here!)

 Let's use it.

```
$ python countdown.py 1000000
[------------------------------------]    0%  18d 23:20:12
```

 The countdown from One Million has begun.
 18 days remaining and 23 hours 20 minutes 12 seconds ww


 so.
 (´-`) .. oO (You should write it properly and send it to the head family Click)
 → [Sent](https://github.com/mitsuhiko/click/pull/453)
 → [Merged](https://github.com/mitsuhiko/click/pull/453#event-460027353)

## countdown.py (improved version)


#### **`big_countdown.py`**
```py

#!/usr/bin/env python
#
# countdown (for big number)
#
import click
import time
import os

_lang = 'en'

def say(num):
    if _lang == 'ja':
        voice = 'Kyoko'
    elif _lang == 'fr':
        voice = 'Virginie'
    elif _lang == 'de':
        voice = 'Anna'
    else:
        voice = 'Agnes'

    os.system('say --voice %s %d' % (voice, num))


def format_eta_impl(self):
    if self.eta_known:
        t = self.eta
        seconds = t % 60
        t /= 60
        minutes = t % 60
        t /= 60
        hours = t % 24
        t /= 24
        if t > 0:
            days = t
            return '%dd %02d:%02d:%02d' % (days, hours, minutes, seconds)
        else:
            return '%02d:%02d:%02d' % (hours, minutes, seconds)
    else:
        return ''


@click.command()
@click.argument('count', type=int)
@click.option('--lang', default='en')
def countdown(count, lang):
    global _lang
    _lang = lang

    numbers = range(count, -1, -1)
    with click.progressbar(numbers) as bar:
        bar.__class__.format_eta = format_eta_impl
        for num in bar:
            say(num)
            time.sleep(1)

if __name__ == '__main__':
    countdown()
```

## Precautions when using click.progressbar for tasks to be processed in parallel

### Symptoms

 ――The progress bar advances about half at first even though nothing is progressing
 ――The task is not completed forever even though the remaining time display is about 1 to 2 seconds.

 For those who have problems with these symptoms.

 For example, if you use multiprocessing.pool.Pool, the progressbar's progress bar will be incremented "when the process is put into the Pool" (rather than when the process actually took place).

 For this reason, a phenomenon occurs in which the progress bar first advances by about half, even though no processing has progressed.

 Since the progressbar feels that about half of the processing has been completed in the first moment, the remaining processing time is estimated to be 1 or 2 seconds and displayed. This is the cause of the phenomenon that the task is not completed forever even though the remaining time display is about 1 to 2 seconds.

### Countermeasures

 To make the progress display progress at the timing when it is actually processed so that the remaining time can be estimated as accurately as possible

 --Set length = (size of iterable) in the argument of click.progressbar () (not iterable itself)
 --Call bar.update (1) every time you process one

### Ctrl-C kill does not work on multiprocessing

 It's not Click's fault, but ... to create an easy-to-use CLI.
 (There is multiprocessing.pool.Pool)
 The Pool has picked up the KeyboardInterrupt, so let's set the initializer to ignore the KeyboardInterrupt on the Pool side.

### Implementation example

 I tried to parallelize the countdown reading sample.
 (It feels bad)


#### **`countdown_mp.py`**
```py

import click
import time
import os
import multiprocessing
from multiprocessing.pool import Pool
from functools import partial
import signal

_lang = 'en'

def say(num):
    if _lang == 'ja':
        voice = 'Kyoko'
    elif _lang == 'fr':
        voice = 'Virginie'
    elif _lang == 'de':
        voice = 'Anna'
    else:
        voice = 'Agnes'

    os.system('say --voice %s %d' % (voice, num))
    time.sleep(1)

    return num


def _init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)


@click.command()
@click.argument('count', type=int)
@click.option('--lang', default='en')
@click.argument('pool-size', type=int, default=multiprocessing.cpu_count())
@click.argument('chunk-size', type=int, default=5)
@click.argument('max-tasks-per-child', type=int, default=3)
def countdown(count, lang, pool_size, chunk_size, max_tasks_per_child):
    global _lang
    _lang = lang

    pool = Pool(pool_size, _init_worker, maxtasksperchild=max_tasks_per_child)
    imap_func = partial(pool.imap, chunksize=chunk_size)

    numbers = range(count, -1, -1)

    with click.progressbar(length=len(numbers)) as bar:
        for result in imap_func(say, numbers):
            bar.update(1)

    pool.close()


if __name__ == '__main__':
    countdown()
```


Recommended Posts

Use the progress bar with Click: (Easy to use, improve the display of tasks that take more than 24 hours, notes when using in parallel processing)
Real-time display of server-side processing progress in the browser (implementation of progress bar)
Receive a list of the results of parallel processing in Python with starmap
Trial to judge the timing of the progress display of the for loop using 0b1111 ~