Das Folgende ist der Ausgabeinhalt, der im Jupyter-Notizbuch beschrieben ist, das heruntergeladen und mit Markdown eingefügt wurde.
https://github.com/booink/spacy-trial1/tree/master Die Betriebsumgebung spiegelt sich in diesem öffentlichen Repository wider.
Es ist ein fließender Inhalt, den ich nur als Test ausgegeben habe, bei dem ich meine Hand nur etwa 30 Minuten lang bewegen konnte. Es ist also nicht schlecht zu lesen.
https://spacy.io/usage/processing-pipelines
Ich werde von oben kopieren.
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the default models consists of a tagger, a parser and an entity recognizer. Each pipeline component returns the processed Doc, which is then passed on to the next component.
Der Punkt ist, dass, wenn Sie Text an die nlp
-Methode übergeben, der tokenisierte Text in einem Doc-Klassenobjekt zurückgegeben wird.
Das Doc-Objekt verfügt über einen Mechanismus namens Pipeline, und das Ergebnis der Kettenverarbeitung wird in einem Bucket an das Doc-Objekt weitergeleitet.
In der Pipeline befinden sich Tagger, Parser und Entity Recognizer (ner).
Das war's. Schauen wir uns die Arten von Dokumentobjekten an.
import spacy
nlp = spacy.load("en")
doc = nlp("This is a text")
type(doc)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-4-69cc80a89d2d> in <module>
1 import spacy
2
----> 3 nlp = spacy.load("en")
4 doc = nlp("This is a text")
5 type(doc)
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Ich war wütend, dass es kein en model gab.
https://spacy.io/usage/models
Versuchen Sie es gemäß Schnellstart
!python -m spacy download en_core_web_sm
Collecting en_core_web_sm==2.2.5
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
[K |████████████████████████████████| 12.0 MB 476 kB/s eta 0:00:01
[?25hRequirement already satisfied: spacy>=2.2.2 in /usr/local/lib/python3.7/site-packages (from en_core_web_sm==2.2.5) (2.2.4)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.0.3)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (4.44.1)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.18.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.23.0)
Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (7.4.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.2)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.6.0)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.1.3)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (46.0.0)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.4.1)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2019.11.28)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.6.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.1.0)
Building wheels for collected packages: en-core-web-sm
Building wheel for en-core-web-sm (setup.py) ... [?25ldone
[?25h Created wheel for en-core-web-sm: filename=en_core_web_sm-2.2.5-py3-none-any.whl size=12011738 sha256=4e741a4ef6924b14806dc4789ff4156bf93b98c79d33f5959516f6a04c73f4bb
Stored in directory: /tmp/pip-ephem-wheel-cache-yazrb305/wheels/51/19/da/a3885266a3c241aff0ad2eb674ae058fd34a4870fef1c0a5a0
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.2.5
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
Ich konnte es herunterladen. Versuchen Sie, den Code auszuführen
import spacy
nlp = spacy.load("en_core_web_sm")
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-6-14d257ed08ca> in <module>
1 import spacy
----> 2 nlp = spacy.load("en_core_web_sm")
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Hmmm. Ist es Akan auf dem Jupiter-Notizbuch? Schreiben Sie es in die Docker-Datei und erstellen Sie es neu.
Ich habe versucht, es wieder aufzubauen. Versuchen Sie es nochmal.
import spacy
nlp = spacy.load("en_core_web_sm")
Es tritt kein Fehler auf. Ist es ein Erfolg? Schauen wir uns den Typ des Dokuments an.
doc = nlp("This is a text")
type(doc)
spacy.tokens.doc.Doc
spacy.tokens.doc.Doc
Ich verstehe.
Was ist in der Pipeline eingestellt?
for p in nlp.pipeline:
print(p)
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3c78613d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc39292ede0>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928c5360>)
Hmmmm. Tagger, Parser, ner Sicher.
Wenn Sie sich den QuickStart des Modells ansehen, können Sie übrigens auch so schreiben ↓.
import en_core_web_sm #Es scheint eine andere Möglichkeit zu geben, als ein Modul zu laden, als das Modell anzugeben, das mit einer Zeichenfolge geladen werden soll
nlp = en_core_web_sm.load() #Gibt die Lademethode ohne Argumente nlp zurück?
doc = nlp("This is a text")
print(doc)
for p in nlp.pipeline:
print(p)
This is a text
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3903805d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc3928bad70>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928ba9f0>)
Was ist ein nlp-Objekt?
type(nlp)
spacy.lang.en.English
Hmm
Recommended Posts