[PYTHON] I made a tool to generate Markdown from the exported Scrapbox JSON file

Introduction

"What about having been an engineer for five years and not producing any output?" I felt a sense of crisis, so I decided to post it on Qiita. It may be difficult to read because it is the first article, but please forgive me.

Overview

I decided to use Scrapbox for internal activities, but eventually I wanted to put the pages in Scrapbox on a file server so that I could leave it as an internal asset. Scrapbox has a function to export the contents of all pages as a JSON file, but it is difficult to read as it is. So I searched for a tool that would convert it to Markdown and save it, but I couldn't find a tool that looked good, so I made it myself using Python.

The exported JSON file has this format. (Exporting without metadata.)

john-project.json


{
  "name": "john-project",
  "displayName": "john-project",
  "exported": 1598595295,
  "pages": [
    {
      "title": "How to use Scrapbox",
      "created": 1598594744,
      "updated": 1598594744,
      "id": "000000000000000000000000",
      "lines": [
        "How to use Scrapbox",
        "Welcome to Scrapbox. You can freely edit and use this page.",
        "",
        "Invite members to this project",
        //Omission
        " [We publish use cases of companies https://scrapbox.io/case]",
        ""
      ]
    },
    {
      "title": "The first line is the heading",
      "created": 1598594777,
      "updated": 1598595231,
      "id": "111111111111111111111111",
      "lines": [
        "The first line is the heading",
        "[**Two asterisks are headline style]",
        "Indented bulleted list",
        " \t Increase the number to further indent",
        " [[Bold]]Or[/ italic]、[-Strikethrough]Can be used",
        " \t like this[-/* italic]Can be combined",
        "[Page link]Or[External link https://scrapbox.io]",
        "code:test.py",
        " for i in range(5):",
        "     print('[*Ignore inside code blocks]')",
        "",
        "`[- code]`Ignore",
        "",
        "table:Tabular format",
        " aaa\tbbb\tccc",
        "Ah ah\t\t Uuu",
        "\t111\t222\t333",
        ""
      ]
    }
  ]
}

Using the created tool, it will be converted to the following Markdown file. (* In order to improve the appearance on Qiita, there is a part where a full-width space is added later to the code block and the end of the line of the table)

``The first line is the heading.md`


#The first line is the heading
###Two asterisks are headline style
-Indented bulleted list
  -Indentation further as the number increases
- **Bold**Or_italic_ 、 ~~Strikethrough~~Can be used
  -in this way_~~**italic**~~_Can be combined
[Page link]()Or[Externallink](https://scrapbox.io)
code:test.py
``` 
 for i in range(5):
 print ('[* Ignore inside code block]')
``` 

`[- code]`Ignore

table:Tabular format
|aaa|bbb|ccc| 
|-----|-----|-----|-----|
|Ah ah||Uuu|
|111|222|333|


The appearance is converted as follows. The line breaks are a little loose, but it's fairly easy to see. scrapbox_sample.png

最初の行は見出し.png

policy

Many members (including myself) are new to Scrapbox, and it seems that they are not very elaborate, so I decided to convert only the notations that I could use without aiming for perfect conversion. The conversion method is simple, just use regular expressions to find the parts written in Scrapbox notation and simply replace them with the Markdown format. Finally, exe it so that it can be used by people who do not have Python installed.

environment

I used Windows10 and Python3.7.

Implementation

File reading

Make sure to receive the JSON file name as the first argument. By doing this, you can use it by simply dragging and dropping the JSON file onto the exe file. Also, create a folder to output Markdown.

filename = sys.argv[1]
with open(filename, 'r', encoding='utf-8') as fr:
    sb = json.load(fr)
    outdir = 'markdown/'
    if not os.path.exists(outdir):
        os.mkdir(outdir)

conversion

From here, each page and each line will be converted in order. Write the conversion target in () of each heading.

Heading (first line)

Scrapbox interprets the first line as a heading, so add # (sharp + half-width space) to the beginning of the first line to make it a heading.

for p in sb['pages']:
    title = p['title']
    lines = p['lines']
    is_in_codeblock = False
    with open(f'{outdir}{title}.md', 'w', encoding='utf-8') as fw:
        for i, l in enumerate(lines):
            if i == 0:
                l = '# ' + l

Code block ( `code: hoge.ext `)

In Scrapbox, code blocks can be represented by code: hoge.ext. As long as the beginning of the line is blank, the code block will continue. I don't want to convert inside the code block, so I will proceed while determining whether the line I'm looking at is inside the code block. Markdown notation when entering and exiting a code block```Add.

# Code block processing
if l.startswith('code:'):
    is_in_codeblock = True
    l += f'\n```'
elif is_in_codeblock and not l.startswith(('\t', ' ', ' ')):
    is_in_codeblock = False
    fw.write('```\n')

# Omission

# Convert if not a code block
if not is_in_codeblock:
    l = convert(l)

####table(`table:hoge`

In Scrapboxtable:hogeThe table can be expressed with. The table continues as long as the beginning of the row is blank. Scrapbox tables don't have headers, but Markdown can't represent a table without headers, so it forces the first row to be interpreted as a header. The cells are separated by tabs, so|Convert to. Spaces at the beginning of a line can have tabs, half-width spaces, and full-width spaces, so they will be converted to muddy.

if l.startswith('table:'):
    is_in_table = True
elif is_in_table and not l.startswith(('\t', ' ', ' ')):
    is_in_table = False
if is_in_table:
    row += 1
    if row != 0:
         l = l.replace('\t', '|') + '|'
        if l.startswith(' '):
            l = l.replace(' ', '|', 1)
    if row == 1:
        col = l.count('|')
         l += f'\n{"|-----" * col}|'

####code(`hoge`

Since I don't want to convert in the code, I put a process to delete the code part before the conversion process of each notation. It's written in the same way as Markdown, so you can just delete it.

def ignore_code(l: str) -> str:
    for m in re.finditer(r'`.+?`', l):
        l = l.replace(m.group(0), '')
    return l

####hashtag(#hoge

If this is written at the beginning of the string, it may be interpreted as a heading by Markdown (it seems to look different depending on the viewer). for that reason,`It is treated as a code by enclosing it in.

def escape_hash_tag(l: str) -> str:
    for m in re.finditer(r'#(.+?)[ \t]', ignore_code(l)):
        l = l.replace(m.group(0), '`' + m.group(0) + '`')
 if l.startswith ('#'): # If all lines are tags
        l = '`' + l + '`'
    return l

####Bulleted list (indent)

The number of indents is counted and replaced with the Markdown format.

def convert_list(l: str) -> str:
    m = re.match(r'[ \t ]+', l)
    if m:
        l = l.replace(m.group(0),
                      (len(m.group(0)) - 1) * '  ' + '- ', 1)
    return l

####Bold ([[hoge]][** hoge][*** hoge]

In Scrapbox[[hoge]]Or[* hoge]If you do like, it will be bold. Also, in the latter notation[** hoge]If you increase the asterisk like, the characters will become larger.

Of the latter notations, the two and three asterisk notations were used like Markdown headings, so I've converted them accordingly. Other than that, it may be used at the same time as other decorations, so it will be converted separately.

def convert_bold(l: str) -> str:
    for m in re.finditer(r'\[\[(.+?)\]\]', ignore_code(l)):
        l = l.replace(m.group(0), '**' + m.group(1) + '**')
 m = re.match (r'\ [(\ * \ * | \ * \ * \ *) (. +?) \]', Igno_code (l)) # Probably the heading
    if m:
        l = '#' * (5 - len(m.group(1))) + ' ' + \
 m.group (2) #Scrapbox has more *
    return l

####Character decoration ([* hoge][/ hoge][- hoge][-/* hoge]etc)

In Scrapbox, in addition to bold, italics[/ hoge]And strikethrough[- hoge]Can be used. These are combined[-/* hoge]Since it can be used like, it processes at the same time.

def convert_decoration(l: str) -> str:
    for m in re.finditer(r'\[([-\*/]+) (.+?)\]', ignore_code(l)):
        deco_s, deco_e = ' ', ' '
        if '/' in m.group(0):
            deco_s += '_'
            deco_e = '_' + deco_e
        if '-' in m.group(0):
            deco_s += '~~'
            deco_e = '~~' + deco_e
        if '*' in m.group(0):
            deco_s += '**'
            deco_e = '**' + deco_e
        l = l.replace(m.group(0), deco_s + m.group(2) + deco_e)
    return l

(The highlight is strange, but I couldn't fix it)

####Link([URL title][Title URL][hoge]

In Scrapbox[URL title]Or[Title URL]Express the link to the outside with. Don't think about the exact thinghttpI decided to interpret the one starting with as a URL. Also,[hoge]A format like this is a link to another page in Scrapbox. You can't use this link after Markdown output, but behind()By adding, only the appearance is like a link.

def convert_link(l: str) -> str:
    for m in re.finditer(r'\[(.+?)\]', ignore_code(l)):
        tmp = m.group(1).split(' ')
        if len(tmp) == 2:
            if tmp[0].startswith('http'):
                link, title = tmp
            else:
                title, link = tmp
            l = l.replace(m.group(0), f'[{title}]({link})')
        else:
            l = l.replace(m.group(0), m.group(0) + '()')
    return l

###exe conversion

Finally, use pyinstaller to exe. Make one exe file without console display.

pip install pyinstaller
pyinstaller sb2md.py -wF

Drag the JSON file into the exe file&You can run the program by dropping it.

##Finally

The code created this time isGitHubIt is placed in. When writing such a little process, I still find Python useful.

I started using Scrapbox just the other day, and I'm not very good at it now, so I plan to update it as soon as another usage comes out.

Recommended Posts

I made a tool to generate Markdown from the exported Scrapbox JSON file
I made a tool to automatically generate a simple ER diagram from the CREATE TABLE statement
I made a plugin to generate Markdown table from csv in Vim
I made a command to markdown the table clipboard
I made a tool to create a word cloud from wikipedia
I made a subtitle file (SRT) from JSON data of AmiVoice
I made a program to check the size of a file in Python
How to generate a Python object from JSON
I made a tool to compile Hy natively
I made a tool to get new articles
I made a tool to estimate the execution time of cron (+ PyPI debut)
I made a tool to automatically back up the metadata of the Salesforce organization
I made a package to create an executable file from Hy source code
I want to see the file name from DataLoader
Python script to create a JSON file from a CSV file
[Python] I made a system to introduce "recipes I really want" from the recipe site!
I made a function to check the model of DCGAN
[Titan Craft] I made a tool to summon a giant to Minecraft
I made you to execute a command from a web browser
I made a toolsver that spits out OS, Python, modules and tool versions to Markdown
I made a script in Python to convert a text file for JSON (for vscode user snippet)
I made a tool to get the answer links of OpenAI Gym all at once
I made a program to solve (hint) Saizeriya's spot the difference
I tried to cut out a still image from the video
I made a scaffolding tool for the Python web framework Bottle
I made a library that adds docstring to a Python stub file.
I made a command to display a colorful calendar in the terminal
I made a tool to automatically browse multiple sites with Selenium (Python)
I want to send a signal only from the sub thread to the main thread
[Django] I made a field to enter the date with 4 digit numbers
I made a kitchen timer to be displayed on the status bar!
I want to receive the configuration file and check if the JSON file generated by jinja2 is a valid JSON
I made a script in python to convert .md files to Scrapbox format
I made a script to display emoji
I made a simple timer that can be started from the terminal
I made a library konoha that switches the tokenizer to a nice feeling
I made a tool to convert Jupyter py to ipynb with VS Code
I made a browser automatic stamping tool.
I made a configuration file with Python
I made a function to see the movement of a two-dimensional array (Python)
I created a script to check if English is entered in the specified position of the JSON file in Python.
[LINE Messaging API] I want to send a message from the program to everyone's LINE
I made a library to operate AWS CloudFormation stack from CUI (Python Fabric)
I made a tool to notify Slack of Connpass events and made it Terraform
I made a tool to easily display data as a graph by GUI operation.
I want to make a music player and file music at the same time
I made an appdo command to execute a command in the context of the app
I tried to automatically generate a port management table from Config of L2SW
I couldn't escape from the futon, so I made a fully automatic futon peeling machine.
[Python] I tried to get the type name as a string from the type function
I made a program to look up words on the window (previous development)
I made a script to record the active window using win32gui of Python
I made a useful tool for Digital Ocean
How to create a JSON file in Python
I made a router config collection tool Config Collecor
Save the object to a file with pickle
I tried to generate a random character string
I want to write to a file with Python
I made a plug-in from the Japan Meteorological Agency GPV to easily create an animated contour diagram with QGIS.
I made a small donation to the non-profit organization "Open Source Robot Foundation" OSRF
Created a tool to output a sequence diagram from a packet capture file of multiple nodes