Ce qui suit est le contenu de sortie décrit dans le notebook jupyter téléchargé et collé avec markdown.
https://github.com/booink/spacy-trial1/tree/master L'environnement d'exploitation se reflète dans ce référentiel public.
C'est un contenu fluide que je viens de sortir de la démarque comme un essai que je ne pouvais bouger que pendant environ 30 minutes, donc ce n'est pas mal à lire.
https://spacy.io/usage/processing-pipelines
Je vais copier du haut.
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the default models consists of a tagger, a parser and an entity recognizer. Each pipeline component returns the processed Doc, which is then passed on to the next component.
Le fait est que si vous passez du texte à la méthode nlp
, elle retournera le texte tokenisé dans un objet de classe Doc.
L'objet Doc a un mécanisme appelé pipeline, et le résultat du traitement en chaîne est relayé vers l'objet Doc dans un compartiment.
Il y a un marqueur, un analyseur et un outil de reconnaissance d'entité (ner) dans le pipeline.
Je vois. Regardons les types d'objets doc.
import spacy
nlp = spacy.load("en")
doc = nlp("This is a text")
type(doc)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-4-69cc80a89d2d> in <module>
1 import spacy
2
----> 3 nlp = spacy.load("en")
4 doc = nlp("This is a text")
5 type(doc)
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
J'étais en colère qu'il n'y avait pas de modèle.
https://spacy.io/usage/models
Essayez-le selon Quick Start
!python -m spacy download en_core_web_sm
Collecting en_core_web_sm==2.2.5
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
[K |████████████████████████████████| 12.0 MB 476 kB/s eta 0:00:01
[?25hRequirement already satisfied: spacy>=2.2.2 in /usr/local/lib/python3.7/site-packages (from en_core_web_sm==2.2.5) (2.2.4)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.0.3)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (4.44.1)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.18.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (2.23.0)
Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (7.4.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.2)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.6.0)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.1.3)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (46.0.0)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (0.4.1)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/site-packages (from spacy>=2.2.2->en_core_web_sm==2.2.5) (1.0.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.2->en_core_web_sm==2.2.5) (2019.11.28)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (1.6.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy>=2.2.2->en_core_web_sm==2.2.5) (3.1.0)
Building wheels for collected packages: en-core-web-sm
Building wheel for en-core-web-sm (setup.py) ... [?25ldone
[?25h Created wheel for en-core-web-sm: filename=en_core_web_sm-2.2.5-py3-none-any.whl size=12011738 sha256=4e741a4ef6924b14806dc4789ff4156bf93b98c79d33f5959516f6a04c73f4bb
Stored in directory: /tmp/pip-ephem-wheel-cache-yazrb305/wheels/51/19/da/a3885266a3c241aff0ad2eb674ae058fd34a4870fef1c0a5a0
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.2.5
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
J'ai pu le télécharger. Essayez d'exécuter le code
import spacy
nlp = spacy.load("en_core_web_sm")
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-6-14d257ed08ca> in <module>
1 import spacy
----> 2 nlp = spacy.load("en_core_web_sm")
/usr/local/lib/python3.7/site-packages/spacy/__init__.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 30 return util.load_model(name, **overrides)
31
32
/usr/local/lib/python3.7/site-packages/spacy/util.py in load_model(name, **overrides)
167 elif hasattr(name, "exists"): # Path or Path-like to model data
168 return load_model_from_path(name, **overrides)
--> 169 raise IOError(Errors.E050.format(name=name))
170
171
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Hmmm. Est-ce Akan sur le notebook Jupyter? Écrivez-le dans le Dockerfile et reconstruisez-le.
J'ai essayé de le reconstruire. Réessayer.
import spacy
nlp = spacy.load("en_core_web_sm")
Aucune erreur ne se produit. Est-ce un succès? Regardons le type de doc.
doc = nlp("This is a text")
type(doc)
spacy.tokens.doc.Doc
spacy.tokens.doc.Doc
Je vois.
Qu'est-ce qui se passe dans le pipeline?
for p in nlp.pipeline:
print(p)
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3c78613d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc39292ede0>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928c5360>)
Hmmmm. tagger, parser, ner Certainement.
Au fait, si vous regardez le QuickStart du modèle, il semble que vous puissiez également écrire comme ceci ↓.
import en_core_web_sm #Il semble y avoir un moyen de charger en tant que module autre que la méthode de spécification du modèle à charger avec une chaîne de caractères
nlp = en_core_web_sm.load() #La méthode load sans argument renvoie-t-elle nlp?
doc = nlp("This is a text")
print(doc)
for p in nlp.pipeline:
print(p)
This is a text
('tagger', <spacy.pipeline.pipes.Tagger object at 0x7fc3903805d0>)
('parser', <spacy.pipeline.pipes.DependencyParser object at 0x7fc3928bad70>)
('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x7fc3928ba9f0>)
Qu'est-ce qu'un objet nlp
type(nlp)
spacy.lang.en.English
Hmm
Recommended Posts