SAMPLE
je|Est|Chat|alors|y a-t-il|。|Nom|Est|encore|Rien|je|。
REFERENCE
Analyse morphologique de raccourci par expression régulière
PYTHON Subtilement modifié.
text_m = []
text = "Je suis un chat. Il n'y a pas encore de nom."
p = re.compile(r"/|[A-Z]+|[a-z]+|[UNE-Hmm]+|[Ah-Hmm-]+|[UNE-Mois]+|[un-Dragon]+|[。、]|/")
m = p.findall(text)
for row in m:
if re.compile(r'^[Ah-Hmm]+$').fullmatch(row):
if row[0] in 'Peeling':
prefix = row[0]
token = row[1:]
text_m.append(prefix)
if (len(token)>0):
text_m.append(token)
elif row[-2:] in 'Donc de':
token = row[0:-2]
suffix = row[-2:]
text_m.append(token)
text_m.append(suffix)
elif row[-1:] in 'Mohagade':
token = row[0:-1]
suffix = row[-1:]
text_m.append(token)
text_m.append(suffix)
else:
text_m.append(row)
else:
text_m.append(row)
## output
'|'.join(text_m)
Recommended Posts