It is a method to generate a log when a Job created from Notebook is executed in an analysis project of Cloud Pak for Data (hereinafter CP4D).

As a background, as of CP4D v3.0, it is not possible to include arbitrary log messages in the Job execution log.

Details: Job execution log

When you open the job of the analysis project and click the time stamp part which is the execution result,

The execution log is displayed. However, this log is only recorded for the Python environment when Job (Notebook) is executed, and any log message cannot be written here. (As of June 11, 2020 CP4D v3.0 LA)

As a workaround, I created a way to output the log to a file from within the notebook and register it as the Data Assets of the analysis project.

Notebook example to output log file

Use Python standard Logger to output logs to both the console (output in Notebook) and the log file. Log files are registered in the data assets of the analysis project using project_lib. For log settings, please refer to this article.

`Write this at the beginning of your notebook`


from pytz import timezone
from datetime import datetime

#logger settings
logger = logging.getLogger("mylogger")
logger.setLevel(logging.DEBUG)

#Log format settings
def customTime(*args):
    return datetime.now(timezone('Asia/Tokyo')).timetuple()
formatter = logging.Formatter(
    fmt='%(asctime)s.%(msecs)-3d %(levelname)s : %(message)s',
    datefmt="%Y-%m-%d %H:%M:%S"
)
formatter.converter = customTime

#Handler settings for log output to console(For display in Notebook. Level specified as DEBUG)
sh = logging.StreamHandler()
sh.setLevel(logging.DEBUG)
sh.setFormatter(formatter)
logger.addHandler(sh)

#Handler settings for log output to file(For Job execution. The level is specified in INFO. Output the log file to the current directory and register it in Data Asset later.)
logfilename = "mylog_" + datetime.now(timezone('Asia/Tokyo')).strftime('%Y%m%d%H%M%S') + ".log"
fh = logging.FileHandler(logfilename)
fh.setLevel(logging.INFO)
fh.setFormatter(formatter)
logger.addHandler(fh)

#Data Asset Registration Library
import io
from project_lib import Project
project = Project.access()

This is an example of how to use it.

try:
    logger.info('%s', 'Processing started started')
    #Write the process you want to do here
    
    #Output log message at any time
    logger.debug('%s', 'dummy debug message')
    logger.info('%s', 'dummy info message')
    
    #Intentionally generate an error(Division by zero)
    test = 1/0
    
except Exception as e:
    logger.exception('%s', str(repr(e)))
    #Exporting log files to Data Asset(When an error occurs)
    with open(logfilename, 'rb') as z:
        data = io.BytesIO(z.read())
        project.save_data(logfilename, data, set_project_asset=True, overwrite=True)

#Exporting log files to Data Asset(At the end of normal)
with open(logfilename, 'rb') as z:
    data = io.BytesIO(z.read())
    project.save_data(logfilename, data, set_project_asset=True, overwrite=True)

Execution result

When I run it in Notebook, the log is obtained as output as shown below, and the log file is generated in the data asset.

`Run-time output in Notebook`


2020-06-11 07:43:12.383 INFO :Processing started started
2020-06-11 07:43:12.388 INFO : dummy info message
2020-06-11 07:43:12.389 ERROR : ZeroDivisionError('division by zero',)
Traceback (most recent call last):
  File "<ipython-input-7-0b7d7ffe66e9>", line 10, in <module>
    test = 1/0
ZeroDivisionError: division by zero

Also, if you save the version of Notebook, create a Job, and execute the Job, a log file will be generated in the data asset.

The generated log file looks like this. I will download it and check the contents. The log file has a level of INFO so it does not contain DEBUG messages.

Consideration about log file name

Filling data assets with log files is unpleasant given the intended use of data assets. Therefore, it is conceivable to always overwrite the log file as one. However, since CP4D is OpenShift (Kubernates), Job's Python environment is created as a pod at runtime and disappears when finished. Therefore, in the case of one file name, only the latest Job execution is recorded in the log file, and the past history is deleted by overwriting. Therefore, in the above example, I tried to keep the history by including the time stamp in the log file name. Please adjust this area according to the application.

As mentioned above, it is not good that data assets are filled with logs, but until it becomes possible to output arbitrary logs to the original Job log, there is no choice but to surpass it for a while. Another method is to record the log in the DB table.

[PYTHON] Output log file with Job (Notebook) of Cloud Pak for Data

Notebook example to output log file

Write this at the beginning of your notebook

Execution result

Run-time output in Notebook

Consideration about log file name

`Write this at the beginning of your notebook`

`Run-time output in Notebook`