The following is the output content described in jupyter notebook downloaded and pasted with markdown.
https://github.com/booink/spacy-trial1/tree/master The operating environment is reflected in this public repository.
It's just a fluent content that I just output markdown as a trial that I could only move my hand for about 30 minutes, so it's not bad to read.
https://spacy.io/usage/processing-pipelines
The sutras are copied from above.
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the default models consists of a tagger, a parser and an entity recognizer. Each pipeline component returns the processed Doc, which is then passed on to the next component.
The point is that if you pass the text to the nlp
method, it will return the tokenized text in a Doc class object.
The Doc object has a mechanism called pipeline, and the result of chained processing is a bucket relay of the Doc object.
There are tagger, parser and entity recognizer (ner) in the pipeline.
I see. Let's look at the type of the doc object.
import spacy
nlp = spacy.load("en")
doc = nlp("This is a text")
type(doc)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-4-69cc80a89d2d> in <module>
1 import spacy
2
----> 3 nlp = spacy.load("en")
4 doc = nlp("This is a text")
5 type(doc)
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
I was angry that there was no en model.
https://spacy.io/usage/models
Try it according to Quick Start
!python -m spacy download en_core_web_sm
Collecting en_core_web_sm==2.2.5
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
[K |████████████████████████████████| 12.0 MB 476 kB/s eta 0:00:01
[?25hRequirement already satisfied: spacy>=2.2.2 in /usr/local/lib/python3.7/site-packages (from en_core_web_sm==2.2.5) (2.2.4)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.0.3)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (4.44.1)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.18.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.23.0)
Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (7.4.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.2)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.6.0)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.1.3)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (46.0.0)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.4.1)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2019.11.28)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.6.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.1.0)
Building wheels for collected packages: en-core-web-sm
Building wheel for en-core-web-sm (setup.py) ... [?25ldone
[?25h Created wheel for en-core-web-sm: filename=en_core_web_sm-2.2.5-py3-none-any.whl size=12011738 sha256=4e741a4ef6924b14806dc4789ff4156bf93b98c79d33f5959516f6a04c73f4bb
Stored in directory: /tmp/pip-ephem-wheel-cache-yazrb305/wheels/51/19/da/a3885266a3c241aff0ad2eb674ae058fd34a4870fef1c0a5a0
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.2.5
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
I was able to download it. Try running the code
import spacy
nlp = spacy.load("en_core_web_sm")
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-6-14d257ed08ca> in <module>
1 import spacy
----> 2 nlp = spacy.load("en_core_web_sm")
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Hmmm. Is it Akan on jupyter notebook? Write it in the Dockerfile and rebuild it.
I tried to rebuild it. Try again.
import spacy
nlp = spacy.load("en_core_web_sm")
No error occurs. Is it a success? Let's look at the type of doc.
doc = nlp("This is a text")
type(doc)
spacy.tokens.doc.Doc
spacy.tokens.doc.Doc
I see.
What is set in the pipeline?
for p in nlp.pipeline:
print(p)
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3c78613d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc39292ede0>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928c5360>)
Hmmmm. tagger, parser, ner Certainly.
By the way, if you look at the model's QuickStart, it seems that you can also write like this ↓.
import en_core_web_sm #There seems to be a way to load as a module other than the method of specifying the model to load with a string
nlp = en_core_web_sm.load() #Does the load method with no arguments return nlp?
doc = nlp("This is a text")
print(doc)
for p in nlp.pipeline:
print(p)
This is a text
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3903805d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc3928bad70>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928ba9f0>)
What is an nlp object
type(nlp)
spacy.lang.en.English
Hmm
Recommended Posts