A little regular expression story Part 1

prologue

Once upon a time, there were Yura, Moya-san, and Yura's entourage in a place called Twitter. Whenever Yura does something, the entourage says:

Main subject

I will do the first half of the last word of Mr. Moya with Java regular expressions. Reference: http://www.javadrive.jp/regex/ I won't touch regular expressions in other languages.

Introduction

First, let's divide "Yura-chan cute" (A) into two parts. Let's say "Yura-chan" (hereinafter B) and "cute" (hereinafter C).

Complex Japanese

Japanese is complicated. There are some kana characters such as "a" and "a" that are the same. Let's make B and C correspond to this, assuming that it does not correspond to the same reading kanji. You can do this by using this. B = [Yuyu] [Lalala] [Chichichi] [Yayaya] [Nun] C = [Kakaka] [Wow Wow] [Iii] [Iii] For C, "i" is always two consecutive, so use this to shorten it. C = [Kakaka] [Wow Wow] [Iii] {2} This is fine.

Inversion of words

Regular expressions can be synthesized by concatenating the context as a character string unless the beginning and end are specified. For example, if it is B + C, 100% of the" Erachan Kawaii "will be caught. But what if you say this? "** Cute ** Yo ** Yura-chan **" It will be difficult if it is inverted. Here, use this and this Let's define a new D. D=.* D will be caught in a string of 0 or more characters. (That is, it will get caught even if it is not) If you use this, you can handle the previous inversion with C + D + B. Even if "Yura-chan is cute", you can handle it with B + D + C.

Regular expression composition

Finally, let's combine multiple regular expressions into one. When I thought about it, a person who said this time appeared.

Let's get back to the main subject. Combine using the above method so that the regular expression can search for B + D + C and C + D + B at the same time. It will be long, but it will be like this. (Hereafter E) E=([Yuyu][Lala]([ChiChiChi][Yayaya][Hmmmm]|[Yuyu][Lala]).*[Kakaka][WowWow][Iiii]{2}|[Kakaka][WowWow][Iiii]{2}.*[Yuyu][Lala]([ChiChiChi][Yayaya][Hmmmm]|[Yuyu][Lala])) looks strong You can probably find most of "Yura-chan cute" with this. Perhaps.

People who cooperated with the name

Moya: Twitter Yura: Twitter Thank you very much.

Finally

Yura is cute ~~ I'm sorry I'm not good at writing ~~ For the second part, go to here

Recommended Posts

A little regular expression story Part 1
A little regular expression story Part 2
A little troublesome story in Groovy
A little addictive story with def initialize
Regular expression basics
JS regular expression
Ruby regular expression
^, $ in Rails regular expression
unicode regular expression sample
Ruby Regular Expression Extracts from a specific string to a string
Regular expression for password
java regular expression summary
[Java] Cut out a part of the character string with Matcher and regular expression
[Ruby/Rails] How to generate a password in a regular expression
Extract a string starting with a capital letter with a regular expression (Ruby)
Full-width / half-width judgment regular expression
What is a lambda expression?
A little complicated conditional branching
Making a minute repeater part.2
Making a minute repeater part.1
Learn regular expressions little by little ①
A story stuck with NotSerializableException
Replace with a value according to the match with a Java regular expression
A little addictive story after updating the JDBC driver for PostgreSQL
I want to extract between character strings with a regular expression