Introduction

This article is intended for people who have never touched spaCy/GiNZA to understand and understand what kind of analysis results will be output.

What is spaCy/GiNZA?

GiNZA is an open source Japanese processing library based on Universal Dependencies (UD). It is built on the commercial level natural language processing framework under the MIT license spaCy.

If you have Python installed, it's easy to install.

$ pip install -U ginza

First try moving it as it is

Since the ginza command can be used, it can be analyzed as it is.

$ ginza
Let's have lunch together in Ginza. How about next Sunday?
# text =Let's have lunch together in Ginza.
1 Ginza Ginza PROPN Noun-Proprietary noun-Place name-General_       6       obl     _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=Ginza|NE=B-GPE|ENE=B-City
2 in ADP particle-Case particles_       1       case    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=De
3 lunch lunch NOUN noun-Common noun-General_       6       obj     _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_B|Reading=lunch
4 to ADP particle-Case particles_       3       case    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=Wo
5 Your NOUN prefix_       6       compound        _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|Reading=Go
6 Together Together VERB Noun-Common noun-Can be changed_       0       root    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|Reading=Issho
7 AUX verb to do-Non-independent_       6       advcl   _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=Sa line transformation,Continuous form-General|Reading=Shi
8 Let's AUX auxiliary verb_       6       aux     _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=Auxiliary verb-trout,Will guess form|Reading=Masho
9. .. PUNCT auxiliary symbol-Punctuation_       6       punct   _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。

# text =How about next Sunday?
1 This time NOUN noun-Common noun-Adverbs possible_       3       nmod    _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_I|Reading=Condo
2 ADP particles-Case particles_       1       case    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=No
3 Sunday Sunday NOUN noun-Common noun-Adverbs possible_       5       nsubj   _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|NP_I|Reading=Nichiyoubi|NE=B-DATE|ENE=B-Day_Of_Week
4 is the ADP particle-Particle_       3       case    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=C
5 How about ADV adverb_       0       root    _       SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=ROOT|Reading=Doe
6 is AUX auxiliary verb_       5       aux     _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=Auxiliary verb-death,End-form-General|Reading=death
7 or PART particle-Final particle_       5       mark    _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=Mosquito
8. .. PUNCT auxiliary symbol-Punctuation_       5       punct   _       SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。

I was able to analyze it safely, but it's hard to see on the console.

Now let's display it in an easy-to-understand manner

This time, I tried to visualize with spaCy Visualizer and Streamlit to make the syntax dependencies and tables easier to see. When drawing syntax dependencies, svg is generated via create_manual () to replace UD terms such as PROPN, ADP, obl, and advcl with Japanese, and drawn with streamlit.image (). ..

input_list = st.text_area("Input string").splitlines()
nlp = spacy.load('ja_ginza')
for input_str in input_list:
    doc = nlp(input_str)
    for sent in doc.sents:
        svg = spacy.displacy.render(create_manual(sent), style="dep", manual=True)
        streamlit.image(svg)

The table is drawn with streamlit.table () and the named entities are drawn with streamlit.components.v1.html (). Click here for the full source code (https://github.com/chai3/ginza-streamlit)

The operation result looks like this.

The input and analysis results are as follows.

Input string

Let's have lunch together in Ginza. How about next Sunday? I am a cat. There is no name yet.

1-1. Let's have lunch together in Ginza.

Syntax dependency

Details

i(index)	0	1	2	3	4	5	6	7	8
orth(text)	Ginza	so	lunch	To	Go	together	Shi	まShiょう	。
lemma(Uninflected word)	Ginza	so	lunch	To	Go	together	To do	Trout	。
reading_form(Reading kana)	Ginza	De	lunch	Wo	Go	Issho	Shi	マShiョウ	。
pos(PartOfSpeech)	PROPN	ADP	NOUN	ADP	NOUN	VERB	AUX	AUX	PUNCT
pos(Part of speech)	Proprietary noun	Set words	noun	Set words	noun	verb	助verb	助verb	Punctuation
tag(Part of speech details)	noun-固有noun-Place name-General	Particle-格Particle	noun-普通noun-General	Particle-格Particle	prefix	noun-普通noun-Can be changed	verb-Non-independent	助verb	Auxiliary symbol-Punctuation
inflection(Utilization information)	-	-	-	-	-	-	Sa line transformation continuous form-General	Auxiliary verb-Mass will guess form	-
ent_type(Entity type)	City	-	-	-	-	-	-	-	-
ent_iob(Entity IOB)	B	O	O	O	O	O	O	O	O
lang(language)	ja	ja	ja	ja	ja	ja	ja	ja	ja
dep(dependency)	obl	case	obj	case	compound	ROOT	advcl	aux	punct
dep(Syntax dependency)	Accusative element	Case sign	Object	Case sign	Compound word	ROOT	Adverbial modifier clause	Auxiliary verb	Punctuation
head.i(Parent index)	5	0	5	2	5	5	5	5	5
bunsetu_bi_label	B	I	B	I	B	I	I	I	I
bunsetu_position_type	SEM_HEAD	SYN_HEAD	SEM_HEAD	SYN_HEAD	CONT	ROOT	SYN_HEAD	SYN_HEAD	CONT
is_bunsetu_head	TRUE	FALSE	TRUE	FALSE	FALSE	TRUE	FALSE	FALSE	FALSE
ent_label_ontonotes	B-GPE	O	O	O	O	O	O	O	O
ent_label_ene	B-City	O	O	O	O	O	O	O	O

Phrase break

Let's have lunch/together in Ginza.

Division of head section and phrase of phrase

Ginza (NP)/Lunch (NP)/Together (VP)

Named entity (entity)

1-2. How about next Sunday?

Syntax dependency

Details

i(index)	9	10	11	12	13	14	15	16
orth(text)	now	of	Sunday	Is	How	is	Ka	。
lemma(Uninflected word)	now	of	Sunday	Is	How	is	Ka	。
reading_form(Reading kana)	Condo	No	Nichiyoubi	C	Doe	death	Mosquito	。
pos(PartOfSpeech)	NOUN	ADP	NOUN	ADP	ADV	AUX	PART	PUNCT
pos(Part of speech)	noun	Set words	noun	Set words	adverb	Auxiliary verb	Particle	Punctuation
tag(Part of speech details)	noun-普通noun-Adverbs possible	Particle-格Particle	noun-普通noun-Adverbs possible	Particle-係Particle	adverb	Auxiliary verb	Particle-終Particle	Auxiliary symbol-Punctuation
inflection(Utilization information)	-	-	-	-	-	"Auxiliary verb-death	End-form-General"	-
ent_type(Entity type)	-	-	Day_Of_Week	-	-	-	-	-
ent_iob(Entity IOB)	O	O	B	O	O	O	O	O
lang(language)	ja	ja	ja	ja	ja	ja	ja	ja
dep(dependency)	nmod	case	nsubj	case	ROOT	aux	mark	punct
dep(Syntax dependency)	Noun modifier	Case sign	Noun phrase subject	Case sign	ROOT	Auxiliary verb	Knot sign	Punctuation
head.i(Parent index)	11	9	13	11	13	13	13	13
bunsetu_bi_label	B	I	B	I	B	I	I	I
bunsetu_position_type	SEM_HEAD	SYN_HEAD	SEM_HEAD	SYN_HEAD	ROOT	SYN_HEAD	SYN_HEAD	CONT
is_bunsetu_head	TRUE	FALSE	TRUE	FALSE	TRUE	FALSE	FALSE	FALSE
ent_label_ontonotes	O	O	B-DATE	O	O	O	O	O
ent_label_ene	O	O	B-Day_Of_Week	O	O	O	O	O

Phrase break

How about next/Sunday /?

Division of head section and phrase of phrase

This time (NP)/Sunday (NP)/How (ADVP)

Named entity (entity)

Is it easier to understand syntactic dependencies? I hope you are interested in GiNZA as much as possible.

[PYTHON] I tried to display the analysis result of the natural language processing library GiNZA in an easy-to-understand manner

Introduction

What is spaCy/GiNZA?

First try moving it as it is

Now let's display it in an easy-to-understand manner

Input string

1-1. Let's have lunch together in Ginza.

Syntax dependency

Details

Phrase break

Division of head section and phrase of phrase

Named entity (entity)

1-2. How about next Sunday?

Syntax dependency

Details

Phrase break

Division of head section and phrase of phrase

Named entity (entity)

Reference site