[PYTHON] Output a character string with line breaks in PyYAML

Overview

--When outputting a YAML file using Python, I needed a little trick to output a character string including line breaks, so I will leave it as a memo. (Although there are rumors that I didn't have enough knowledge of YAML ...)

PyYAML

--Because Python handles YAML, we will use PyYAML.

Thing you want to do

――When there is character string data including such line breaks

 {
      'aa': 'bbbb\ncccc\ndddd',
      'bb': 'eeee'
 }

――I want to output in block style with a line feed code like this.

aa: |
  bbbb
  cccc
  dddd
bb: eeee

Try it as it is

--For the time being, I will try yaml.dump as it is -* What was actually needed was a process to output a file, but the process is to print in an easy-to-understand manner.

import yaml

def main():

    test_dict = {
        'aa': 'bbbb\ncccc\ndddd',
        'bb': 'eeee'
    }
    print(
        yaml.dump(test_dict,
                  allow_unicode=True,
                  encoding='utf-8',
                  default_flow_style=False).decode()
    )


if __name__ == '__main__':
    main()

――Something is different. ..

aa: 'bbbb

  cccc

  dddd'
bb: eeee

Registering representer

--When I looked it up, it was similar to around here. Since there was content, I corrected it and tried to deal with it. --Here is the modified code.

import yaml

def represent_str(dumper, instance):
    if "\n" in instance:
        return dumper.represent_scalar('tag:yaml.org,2002:str',
                                       instance,
                                       style='|')
    else:
        return dumper.represent_scalar('tag:yaml.org,2002:str',
                                       instance)

def main():

    test_dict = {
        'aa': 'bbbb\ncccc\ndddd',
        'bb': 'eeee'
    }
    yaml.add_representer(str, represent_str)
    print(
        yaml.dump(test_dict,
                  allow_unicode=True,
                  encoding='utf-8',
                  default_flow_style=False).decode()
    )


if __name__ == '__main__':
    main()

--The style for outputting with the add_representer method is defined. --Specify style ='|' only if the string contains a line feed code. --The execution result is here


aa: |-
  bbbb
  cccc
  dddd
bb: eeee

――It feels good!

There are cases where it doesn't work. ..

――When I was outputting with the above implementation, there were some cases where it was not output as expected. ――When I investigated, I found that if there was a space before the line break of the character string, it would not be output properly. --For such data

    test_dict = {
        'aa': 'bbbb\ncccc \ndddd',
        'bb': 'eeee'
    }

――It doesn't break. ..

aa: "bbbb\ncccc \ndddd"
bb: eeee

I looked it up

――I was curious, so I checked the PyYAML code. --analyze_scalar I'm determining the type of data in the method, but next to the space If there is a line feed code in , the variable space_break will be True. --Then, all the flags of Around here will be set to False.

        if space_break or special_characters:
            allow_flow_plain = allow_block_plain =  \
            allow_single_quoted = allow_block = False

--When actually outputting yaml, judge the style in this choose_scalar_style method It is designed to output. --Originally, as implemented earlier, it should be output in the style specified by the ʻadd_representer method, but among the flags above, the style specified that ʻallow_block is False (here| It is not output with).

        if self.event.style and self.event.style in '|>':
            if (not self.flow_level and not self.simple_key_context
                    and self.analysis.allow_block):
                return self.event.style

--There is such a description in the comment of analyze_scalar that set the flag earlier. --It seems that if there is a line break following a space, the output will not be in block style.

Spaces followed by breaks, as well as special character
 are only allowed for double quoted scalars.

--Furthermore, if you look at analyze_scalar, you can see that location where ʻallow_blockis set to False is There is. --It seems that allow_block is false set toif there is a space at the end of the string. --There is such a description in the comment, and it seems that the output in block style is not output as well. - We do not permit trailing spaces for block scalars.`

--In summary, if there is a line break following a space in the character string, and if there is a space at the end of the character string, the output will not be output in the block style where the character string is broken by the line feed code. is.

Correspondence

――So, in the end, we implemented it like this. --Simply replace the corresponding string in the represent method.

import yaml
import re


def represent_str(dumper, instance):
    if "\n" in instance:
        instance = re.sub(' +\n| +$', '\n', instance)
        return dumper.represent_scalar('tag:yaml.org,2002:str',
                                       instance,
                                       style='|')
    else:
        return dumper.represent_scalar('tag:yaml.org,2002:str',
                                       instance)


def main():

    test_dict = {
        'aa': 'bbbb\ncccc \ndddd',
        'bb': 'eeee'
    }
    yaml.add_representer(str, represent_str)
    print(
        yaml.dump(test_dict,
                  allow_unicode=True,
                  encoding='utf-8',
                  default_flow_style=False).decode()
    )


if __name__ == '__main__':
    main()

--Even if there is a line break after the space, the output is as expected.

aa: |-
  bbbb
  cccc
  dddd
bb: eeee

YAML is deep. ..

――I haven't investigated it properly, but it seems that the process is based on the YAML specifications. ――I didn't really care about it, but the YAML specifications seem to be profound. ..

Reference site

Recommended Posts

Output a character string with line breaks in PyYAML
[Introduction to Python] How to output a character string in a Print statement
Change line breaks in iPython autoformatted output
Insert a date without line breaks in CotEditor
Note) Batch conversion of specific symbols contained in a character string with a dictionary
How to convert / restore a string with [] in python
[Python] How to expand variables in a character string
I want to split a character string with hiragana
[Python] Leave only the elements that start with a specific character string in the array
Display character strings without line breaks in python (personal memo)
[In one line] Visualize like a lawn with just Pandas
How to output a document in pdf format with Sphinx
Until I return something with a line bot in Django!
Python learning basics ~ How to output (display) a character string? ~
Output the line containing the specified string
Convert to a string while outputting standard output with Python subprocess
[Python] Create a program to delete line breaks in the clipboard + Register as a shortcut with windows
Is it a character string operation?
[Introduction to Python] How to split a character string with the split function
Print with python3 without line breaks
Try to extract a character string from an image with Python3
How to get a string from a command line argument in python
Read the standard output of a subprocess line by line in Python
Outputs a line containing the specified character string from a text file
How to extract the desired character string from a line 4 commands
Save a YAML-formatted file in PyYAML
Create a random string in Python
[Golang] Check if a specific character string is included in the character string
[Introduction] Insert line breaks in Python 3
Create a LINE Bot in Django
How to input a character string in Python and output it as it is or in the opposite direction.
Ansible: Shows multi-line commands executed in the shell module with line breaks
[Introduction to Python] How to write a character string with the format function
Make the line breaks visible in journalctrl
Spiral book in Python! Python with a spiral book! (Chapter 14 ~)
I made a character counter with Python
Draw a heart in Ruby with PyCall
# 5 [python3] Extract characters from a character string
PDF output with Latex extension in Sphinx
Generate a class from a string in Python
Read a character data file with numpy
To output a value even in the middle of a cell with Jupyter Notebook
[Tentative] How to convert a character string to Shift_jis with kivy-ios Memo kivy v1.8.0
Character strings placed in GCS with python are garbled when viewed with a browser
I made a stamp substitute bot with line
Send a message to LINE with Python (LINE Notify)
[Python] Get the files in a folder with Python
How to send a message to LINE with curl
Draw a graph with Japanese labels in Jupyter
How to embed a variable in a python string
Delete data in a pattern with Redis Cluster
Create a LINE BOT with Minette for Python
Create a virtual environment with conda in Python
Build a Django environment with Vagrant in 5 minutes
Clone with a specific branch / tag in GitPython
Make a rock-paper-scissors game in one line (python)
I made a LINE Bot with Serverless Framework!
Calculation of match rate of character string breaks [python]
Character encoding when dealing with files in Python 3
Extract lines containing a specific "string" in Pandas
Work in a virtual environment with Python virtualenv.