It is a method to generate a log when a Job created from Notebook is executed in an analysis project of Cloud Pak for Data (hereinafter CP4D).
As a background, as of CP4D v3.0, it is not possible to include arbitrary log messages in the Job execution log.
When you open the job of the analysis project and click the time stamp part which is the execution result,
The execution log is displayed. However, this log is only recorded for the Python environment when Job (Notebook) is executed, and any log message cannot be written here. (As of June 11, 2020 CP4D v3.0 LA)
As a workaround, I created a way to output the log to a file from within the notebook and register it as the Data Assets of the analysis project.
Use Python standard Logger to output logs to both the console (output in Notebook) and the log file. Log files are registered in the data assets of the analysis project using project_lib. For log settings, please refer to this article.
Write this at the beginning of your notebook
from pytz import timezone
from datetime import datetime
#logger settings
logger = logging.getLogger("mylogger")
logger.setLevel(logging.DEBUG)
#Log format settings
def customTime(*args):
return datetime.now(timezone('Asia/Tokyo')).timetuple()
formatter = logging.Formatter(
fmt='%(asctime)s.%(msecs)-3d %(levelname)s : %(message)s',
datefmt="%Y-%m-%d %H:%M:%S"
)
formatter.converter = customTime
#Handler settings for log output to console(For display in Notebook. Level specified as DEBUG)
sh = logging.StreamHandler()
sh.setLevel(logging.DEBUG)
sh.setFormatter(formatter)
logger.addHandler(sh)
#Handler settings for log output to file(For Job execution. The level is specified in INFO. Output the log file to the current directory and register it in Data Asset later.)
logfilename = "mylog_" + datetime.now(timezone('Asia/Tokyo')).strftime('%Y%m%d%H%M%S') + ".log"
fh = logging.FileHandler(logfilename)
fh.setLevel(logging.INFO)
fh.setFormatter(formatter)
logger.addHandler(fh)
#Data Asset Registration Library
import io
from project_lib import Project
project = Project.access()
This is an example of how to use it.
try:
logger.info('%s', 'Processing started started')
#Write the process you want to do here
#Output log message at any time
logger.debug('%s', 'dummy debug message')
logger.info('%s', 'dummy info message')
#Intentionally generate an error(Division by zero)
test = 1/0
except Exception as e:
logger.exception('%s', str(repr(e)))
#Exporting log files to Data Asset(When an error occurs)
with open(logfilename, 'rb') as z:
data = io.BytesIO(z.read())
project.save_data(logfilename, data, set_project_asset=True, overwrite=True)
#Exporting log files to Data Asset(At the end of normal)
with open(logfilename, 'rb') as z:
data = io.BytesIO(z.read())
project.save_data(logfilename, data, set_project_asset=True, overwrite=True)
When I run it in Notebook, the log is obtained as output as shown below, and the log file is generated in the data asset.
Run-time output in Notebook
2020-06-11 07:43:12.383 INFO :Processing started started
2020-06-11 07:43:12.388 INFO : dummy info message
2020-06-11 07:43:12.389 ERROR : ZeroDivisionError('division by zero',)
Traceback (most recent call last):
File "<ipython-input-7-0b7d7ffe66e9>", line 10, in <module>
test = 1/0
ZeroDivisionError: division by zero
Also, if you save the version of Notebook, create a Job, and execute the Job, a log file will be generated in the data asset.
The generated log file looks like this. I will download it and check the contents. The log file has a level of INFO so it does not contain DEBUG messages.
Filling data assets with log files is unpleasant given the intended use of data assets. Therefore, it is conceivable to always overwrite the log file as one. However, since CP4D is OpenShift (Kubernates), Job's Python environment is created as a pod at runtime and disappears when finished. Therefore, in the case of one file name, only the latest Job execution is recorded in the log file, and the past history is deleted by overwriting. Therefore, in the above example, I tried to keep the history by including the time stamp in the log file name. Please adjust this area according to the application.
As mentioned above, it is not good that data assets are filled with logs, but until it becomes possible to output arbitrary logs to the original Job log, there is no choice but to surpass it for a while. Another method is to record the log in the DB table.
Recommended Posts