[PYTHON] Don't print and import logging for logging

Introduction

http://docs.python.jp/3/howto/logging.html

Introductory Python articles generally show you how to call the functions that come with the logging module, such aslogging.debug (), in the introduction to Python's logging features.

The "basic tutorial" of logging provided by the Python head family does not make much difference in this respect. In the basic tutorial of the Python family, the method of using print is also effective as a logging option, and it is said that print and `logging.debug ()` should be used properly according to the task. It is described in the flow.

Display console output commonly used in command line scripts and programs: print ()

Only after such a "basic" explanation can the "advanced" tutorial begin to explain the logging mechanism that comes with the Python language. In "advanced", 4 + 1 types of data structures are introduced, and here you can find out the identity of `logging.debug ()`, which was often used in "basic".

Logging is done by calling a method on an instance of the Logger class (hereafter logger). Each instance has a name and is conceptually arranged in a namespace hierarchy, with dots (periods) as separators. For example, the logger named ‘scan’ is the parent of the loggers ‘scan.text’, ‘scan.html’ and ‘scan.pdf’. The logger name can be anything you want and points to where the logged message occurred. .. The root of the logger hierarchy is called the root logger. That is the logger used by the functions debug (), info (), warning (), error () and critical (). These functions simply call the method of the same name in the root logger. These functions and methods have the same signature. The name of the root logger is displayed as ‘root’ in the log output.

There are actually practical Python applications out there that use print for log output. Probably "quite".

The previous writer also used print to display the log. But now I almost avoid logging with print except in cases like tiny scripting for confirmation with less than 10 lines.

Personally, I don't think any Python beginners should be taught or recommended how to use ** print or logging``` for logging **. At least, I don't think it's the right order to teach the official Python "Basic Tutorial".

This article attempts to explain why you came to think that way.

Precautions when reading the following

The author of this article doesn't have as much Python knowledge as a Python committer (it was at the time of the first edition release and at the time of the revise). Please let us know if there are any mistakes in the content.

Also, the fact that I didn't read the tutorial properly for a while after I knew the logging module, which should be regarded as an individual problem of the author, also inspired me to write this sentence. .. Please forgive me for the fact that some of the content does not seem to be "howling of beginners", but it is a problem on the document side.

If you just want to know best practices, read Python Logging Best Practices.

Why not: print

Let's talk about the implementation of a particular Python app from the perspective of others.

One of the problems with log output with print is that" I don't know if the displayed string is log output or the information that the command line tool really wants to present to the user. " Can be done.

There is no problem using Python's print itself" unless it is for logging purposes ". For example, suppose you implement the Linux tee command yourself. It would not necessarily be a mistake to write the standard output with print while writing the data received on the standard input to a file. This is an example of "the output you really want".

The "really desired output" here is the "something" on the output side when the program was a tool that "receives some input and outputs something".

On the other hand, a log is a "record" in which a program generated in the process operates.

In this article, I'd like to say "for log output", that is, if you want to record "I'm crazy. I'm crazy here, I'll tell you ~", stop the raw print based approach. That is.

If this is a script with 10 to 20 lines, it's still better, and as I wrote at the beginning, I also use print if that's the case.

For example, if you write sample code that knows the behavior when divided by 0,

print('started')
a = 1/0
print('ended')

There will be no problem with writing such as. This print is actually a log, rather than a desired output, a" record "of the program's operation. But it's obvious that it's foolish to use the logging module with code of this size.

On the other hand, relatively complicated library tools, such as double-digit files, may have completed their own print based logging system.

Let's talk about a Python library I came across (at least it was a relatively well-known library when I found it). In that library, I implemented my own based on print that it can be easily realized by combining Logger and Handler of the logging module. There was no mechanism equivalent to the log level. There was a mechanism like a tag to separate the log types, but the tag is only used for if branching whether to output or not, and when it is displayed in print, it will be part of the string. It will be mixed.

There are many problems with such a library-specific logging mechanism. First of all, it is not possible to determine whether the output log content is a "severe log" or "a debug log output". .. It will be necessary to re-sort the raw character string with a regular expression, etc., depending on which category the error belongs to. This gives you no decent control over the log.

Also, for users familiar with Python, it imposes a different logging mechanism than Python's logging, and you need to take extra steps to understand the intent and limitations of that mechanism. If you're feeling short of Python modules, it's a bad idea to just make a degraded version.

Not limited to Python, you should think that ** just separating stdout and stderr does not give a passing score as a modern logging mechanism **.

Rewriting an app that was originally built based on print to use the functions of logger is quite troublesome, or perhaps very troublesome. It may be possible on the receiving side to "salvage" the contents of stderr and send it back to logger, but it's a hassle. Either way, it's honestly barren.

From the user's point of view, the print based approach looks like" I've flushed everything down the sewage so get it if necessary.

If you had a tool published on GitHub and the print debugging world was unfolded when someone was interested, you wouldn't even get a glimpse of the essence of the tool, but the user would scratch their heads. Become. I became.

In the first place, the unique logging mechanism of the Python library was so bad that it shocked me, which led me to write this article.

Why not: logging

Python loggers have a hierarchical and tree structure. In the metaphor of a tree, multiple logs (an instance of the LogRecord class) that arrive from the Happa are received at the point where they hit a branch (Logger) and reach the root logger ("root logger"). You can iterate over one log with a logger. You can also prevent the log from propagating from one branch to the next (closer to the root) branch (propagate attribute). For details, it is recommended to read Advanced Tutorial.

here,

logging.debug()

It means that the information is directly inserted into the root "root logger" even in the log tree structure of Python.

The root logger is also the root of the logging mechanism for all modules called by the Python runtime you launch, and is the "global variable" itself.

If each library calls these functions that come with the logging module (as well aslogging.info ()etc.) directly, it implicitly pollutes the world (namespace) of the other modules. Become.

Specifically, if you set the log level of the root logger to DEBUG to see the log of library A, the DEBUG log to the root logger that other libraries (library B, library C ...) have entered will also be displayed inseparably. .. It is similar to the unnamed log that was poured into stderr with print in that it cannot be separated.

In fact, sources like advanced tutorials and Python Cookbooks (for advanced users and above ??) say that you should consider using logger instead of logging "when building a library".

http://docs.python.jp/3/howto/logging.html#configuring-logging-for-a-library

Also, of course, the basic tutorial also describes that area.

This pattern in mylib.py can be generalized to multiple modules. Note that in this simple usage pattern, you can see the event description by looking at the log file, but not where in the application the message came from. If you want to track the location of your messages, you'll need documentation beyond this tutorial level – see Advanced Logging Tutorial.

It's not that short since I started using Python, but it's only been a few years since I first saw it. Also, it was about the same time that I became able to understand the true meaning. After seeing several stray articles, I was convinced that "Yes ... that's right, the function directly under logging is not good ", and when I read the tutorial again, it was actually written.

Stray article example: Python Logging Best Practice

I personally found it very annoying ** After a brief explanation in the tutorial that you can log with logging.debug () or print", a better best practice is It means that it is written **.

Looking back, I feel that the methods introduced earlier have very little taste in the first place. And I decided to rewrite everything with all my strength.

ʻImport logging` even annoying

I'm embarrassed because I often make mistakes ...

After understanding some of the topics above, and once I came to the common sense that logging.debug () shouldn't be used, I wrote logger = logging.getLogger (__name__) at the beginning of each module. I wrote like , and all the code after that started to use logger.

But in the middle of writing

logger.debug()

When

logging.debug()

I'm writing by mixing. It's flying to a different place!

You might think it's not a big deal, but if you're looking at the logs to track down the cause of a bug, you can just miss the trivial difference between ** logger and logging and where the function will go. I don't know what it is **. The more you create a highly structured log tree structure, the more logging.debug () points to, assuming that you are pointing to a Logger instance that has a log level or log file specification that you want logger. Root loggers tend to be wooden figures that don't output anything. logger.debug () will output a meaningful string to a file and logging.debug () will secretly leave a message in "Danmari" or somewhere else, tracking from the log file. Will be difficult or impossible.

Unless you're debugging interactively with breakpoints, you may not notice the trivial difference between the strings logger and logging until the end. ** Humans are not good at distinguishing similar strings **. Unfortunately, neither is a grammatical error at the time of ʻimport logging`. Do you expect the highlight color of the editor to be different? ??

There is a simple workaround.

#Damaged
import logging
logger = logging.getLogger(__name__)

Means "do not write".

from logging import getLogger
logger = getLogger(__name__)

Is to write.

I was shocked when I came up with this simple workaround. It's embarrassing that I didn't notice it.

I've always wondered why I can't think of it

import logging
logging.debug()

It must have been because I was guided. I blame others.

The first time I thought this replacement problem was "bad" was when I mistakenly implemented the above logging and logger in the process of creating a service that runs for a long time with Django (web framework). did.

When I was chasing the implementation of the function, there was a moment when no log was output for some reason, and I looked up with a face saying "Why isn't it output, hey, why isn't the log output only here ... (゚ Д ゚)" ... It was really a free typo. I can cry. Again, ** humans aren't good at distinguishing similar strings **.

Here, I dare to consider ʻimport logging` itself as a problem. For the functions directly under this module

The two functions are mixed. It's very mixed.

The former should have a very limited and limited impact on the entire program. The latter should not be used unless you are confident in the first place.

In both cases it's too subtle to allow the full ʻimport logging` at the beginning, but the tutorial explains this first. It's as if ** recommending beginners to make mistakes **.

To avoid human error, you can stop using the object name logger and uselog.debug (). However, I don't like it because it's hard to tell whether it's logging or logger.

By the way, the main function receives logger. You can use your own logger by default

Having the function ready to accept the logger, such as via optional arguments, gives you more flexibility when trying to use it as a library. This is a so-called Dependency Injection.

def do_some_great_stuff(arg_a, arg_b, logger=None):
    logger = logger or _your_default_logger
    #Amazing thing

If it is Python3, you can force it to be an optional argument. Rather!

def do_some_great_stuff(arg_a, arg_b, *, logger=None):
    logger = logger or _your_default_logger
    #Amazing thing

In Python3, after the naked *, it becomes a "keyword-only argument". Python2 system does not have this grammar, so it is processed dynamically with ** kwargs. For details, refer to "Item 21: Increase clarity with keyword-specific arguments" and Glossary in "Effective Python".

If you manage the Logger object yourself and do not expose it to the outside, you will not be able to manage the logs in a unified manner by replacing the loggers of that module as needed by others.

Suppose developer A uses the API of external library B while doing a series of processing using the Logger created by getLogger ('service_a.do_something'). If that library gets the Logger via getLogger ('service_b.do_bad_thing') and you can't replace it, then probably the Handler and Format are managed separately. This is very inconvenient for tracking the execution process of a function. From the developer A's point of view, it tends to be "Let me use my Logger here".

There are cases where it is very convenient for an outsider to associate each function in the library with another Logger for another purpose.

Unless you really care about other circumstances (performance etc.), I would like you to have a structure that can separate processing for each logger, and even if you manage it yourself, use the logger tree structure well to name the series. I want you to divide by.

Addendum (2017-07-07): If it is assumed that the function etc. will be used by an outsider as a library, Handler other than NullHandler should not be added to _your_default_logger here. Recommended. By the way, a Logger that does not include any Handler basically gives a warning.

https://docs.python.jp/3/howto/logging.html#configuring-logging-for-a-library

In addition, I want you to do that level of support when you publish it on GitHub.

For the author, I want you to be in a state of using logger if you notice it even at the easy publishing stage of" I wrote it with my own script, I wrote it for the time being! ".

People who read the current tutorial "quickly" will choose print or logging, but you should be able to use Logger with just a slight change in the order of the explanations. At the "level I tried using", I feel that the cost of print and logger does not change much.

Here is an example.

from logging import getLogger, StreamHandler, DEBUG
logger = getLogger(__name__)
handler = StreamHandler()
handler.setLevel(DEBUG)
logger.setLevel(DEBUG)
logger.addHandler(handler)
logger.propagate = False

logger.debug('hello')

It just ends. I think it's a lot of "magic" lines, but it feels like a copy because it's fine.

Personally, I think it's better to raise the criticism that setLevel () appears twice here. Oh, this blame is what I shouted a few days ago. Omitted if it's too throat.

In this case, if you don't want to spit out the log, just use NullHandler instead of StreamHandler, If you want to perform log rotation according to size, there is RotatingFileHandler. If you want to email more than the error message, there is something like SmtpHandler!

The claim here is, "I want you to teach me such a magical level of wording at the beginning." Conversely, you shouldn't even write print orlogging.debug ()in your logging tutorial.

Note: The claim here is not "let's demand heavy weight from beginners". The story is, "Don't include words in the tutorial that can easily enter the dead end." No matter what, I use print in situations where I use it. Because it's simple. You just don't have to boost its usage. At the end, you will find that such a method is convenient in some cases.

Note 2 (2016-11-17 postscript): If the above DI is done, the DEBUG log (may dump password etc.) can be displayed arbitrarily by an outsider, so I am a little worried about security. I wonder if there is any direction. However, I wonder what I'm saying in a language where the instance has virtually no private variables.

Note 3 (2017-07-07 postscript): There is also the idea of using basicConfig () first when writing the setting of the startup part as a script, and in fact it can be written more concisely (see comment) ). However, in my personal opinion, the root logger is not used because it is a stance that it should be avoided at this early stage. In reviewing this time, I found that I could not see the taste unless I did logger.propagate = False, so I added it.

"Problem of teaching order"

He continues to talk about "Don't teach me anything crap like print or logging.debug () first."

Whenever I want to debug something, I just read the docs with the utmost effort to eliminate the bug.

At that time, ** the developer does not want to learn the whole system **.

At the beginning of the document, you should believe that the method that everyone believes is the most concise and perhaps decent is written. At least I don't expect it to say something like "Sorry, it's not the right way to do it!"

On the other hand, the current Python logging documentation gives the impression that important things are written at the end, with the expectation that they will learn the system to the end. Or, if there are multiple methods (though it seems NG for Python at that point), the structure is such that the reader is convinced and leaves when the first one is called (the basis is "advanced"). It is written).

Although the content written later is more appropriate, the method that is not recommended earlier is described, and only the method that is not recommended when a beginner leaves the document in the middle is remembered ...

Shortly after reading the docs that way for the first time, users forget what they did and why they took that logging method, and also called the first 10 lines of the tutorial to logging.debug ( ) Use ...

Or, in the worst case, you will have your own log mechanism built in while misunderstanding that print is okay ...

When it comes to logging, I strongly believe that this method, which seems to have a solution at the beginning of the document, is very bad in this regard. The teaching method is "okashii" in the first place.

In that way, the user misplaces the logging mechanism and publishes the tool, and when the tool is refined to some extent, many others will see it, but ... It may be too late. I don't think it's just that the developer is doing something wrong.

Even more troublesome is that the problem that the developer or user of the tool really wants to solve is usually not directly linked to the logging mechanism. It solves the problem, but because it contains a terribly distorted log mechanism, it says, "I have what I want, but ... something ... it's hard to use !?" If you don't choose the correct answer from the beginning, these problems are not motivating to fix.

Of course, it is possible to make corrections to the published source. Logs usually just call a function that creates and displays a string somewhere in the code, so an outsider says, "This print-based logging mechanism is in a standard Python module. You can do a PR (Pull Request) saying, "You can replace everything with Logger`."

However, it would be barren to publicize the correction of the log output logic at that stage ... No, it would be nice if you just wanted to take the head of the demon.

** I feel that the world is much cleaner if you say "You don't have to know the details of Python logging, sologger = getLogger (__ name__)use it!". It's still easy to make it an argument to a function, and pulling it into a ** kwargs or optional argument wouldn't be a big deal.

Became longer

If you are new to Python logging for the time being, the logger initial settings when writing a single execution script

from logging import getLogger, StreamHandler, DEBUG
logger = getLogger(__name__)
handler = StreamHandler()
handler.setLevel(DEBUG)
logger.setLevel(DEBUG)
logger.addHandler(handler)
logger.propagate = False

logger.debug('hello')

That's all for writing.

Addendum (2017-07-07): An alternative is shown at the bottom of this entry from the background of the comment. Also, in the case of a library, it is pointed out that it is harmful to write this code for the time being, so I added the sentence "For logger initial setting when writing a single execution script". Please see the postscript below for details.

By the way, why was it such a long sentence just to convey that ...

Python logging tutorials have been structured for a long time, so I'm not interested in the story of what happens when I write this now. There is no motivation to hit the head family. However, I think there are some important issues to keep in mind.

If you are interested in Python but don't understand the log well, please leave only getLogger (__name__) in the corner of your head. If it gets stuck in the corner of your head, at least, you'll be less likely to stop reading the tutorial in the middle.

Relation

Update (2016-11-17)

I reread the contents and tried to supplement some parts and correct the part where Japanese was chaotic. I am aware of Python3 and add some concrete code.

However, I also felt that the "momentum" that the author had at that time was important, so the redundant sentences themselves were untouched.

I will write a separate sentence that extracts only the methods that I think are the best practices.

Update (2017-07-05)

The following boiler plates are available for your reference.

https://www.python-boilerplate.com/py3+executable+logging

I noticed this, but in the above example, the root logger is dirty.

Update (2017-07-06, 2017-07-07)

It was pointed out that the snippet in the conclusion section of this entry was not convincing, so I added it.

First of all, I added logger.propegate = False to the code in the text because of the circumstances as described in 2017-07-05.

In addition, it is first necessary to present the premise that "when starting to write a single script", and the alternatives are as follows.

from logging import basicConfig, getLogger, DEBUG

#Write this only in the main file
basicConfig(level=DEBUG)

#Write this to all files
logger = getLogger(__name__)

logger.debug('hello')

If you write a "library", the situation is completely different and "Prepare a logger with only" NullHandler`` set "is correct, so the code presented in the text will be" incorrect ".

I think the difference in stance can be summarized in what you think of this sentence that you have pointed out.

I am like this

I feel that my position deviates from the "standard" ones that are assumed and inferred from the documents. But I don't care about that.

However, if you need a root logger, there are times when you need to actively use it, so if you need to switch, switch your ideas. Apart from that, I'm not saying that "the root logger will die" (I shouldn't say that in the text).

Besides, even if the log is written by print (), after all the curse, I will pick up the necessary signal if it is there. At the end of last year, I saw such an example being rather terrible. I don't know.

Also, I somehow thought that my stance would not hold in the first place depending on the scale. If the log comes in TB class a day, it will be flushed to the root logger from the beginning and analyzed in another layer. I also thought that the scale of the problem I was dealing with was always relatively very small.

If you are interested in the detailed issues, please see the comments. Personally, in the above stance, basicConfig () is calm for those who do not use it (for now), but in the stance of root logger recommendation, basicConfig () is not used The reason is not found at this initial code level.

By the way, in the first place, I didn't have the idea of separating the viewpoints of "library", "script", "logger setting", and "logger use" until I was pointed out. Thank you @nya.

(No, comments may still continue)

Update (2018-08-31)

https://www.python-boilerplate.com/py3+executable+logging

The logging library used at the boilerplate site I introduced has changed from the standard one to the one for the 3rd party called logzero.

I think this is fine, but as a general rule, please be careful about safety etc. when using modules other than the standard library. Details are omitted here as it deviates significantly from the contents of this entry.

Recommended Posts

Don't print and import logging for logging
Double loop in for statement and then print statement behavior
Learn about python's print function and strings for beginners.
Hook to the first import of the module and print the module path
Pitfalls and workarounds for pandas.DataFrame.to_sql
Don't learn Java and C ++! !! Languages for beginners [more carefully selected]
Commentary on unbiasedness and consistency for those who don't like it