[PYTHON] Class when inferring with fairseq

I've been modifying interactive.py for a long time, but officially there was a good function.

Previous method Classify fairseq interactive

code

from fairseq.models.transformer import TransformerModel

class Interactive:
    def __init__(self, spm_path, data_path, checkpoint_path, checkpoint_name):
        #Number of sentences processed at the same time
        self.num = 32

        self.ltos = TransformerModel.from_pretrained(
            checkpoint_path,
            checkpoint_file=checkpoint_name,
            data_name_or_path=data_path,
            bpe='sentencepiece',
            sentencepiece_model=spm_path,
            no_repeat_ngram_size=2
            )

    def inference(self, texts: list):
        result = []
        n = self.num
        for t in [texts[i*n:(i+1)*n] for i in range(len(texts))]:
            result += self.ltos.translate(t)
        return [r.replace("_", " ") for r in result]

Commentary

--Basically, you just infer with fairseq.models.transformer.from_pretrained.translate. ――If you try to process about 100 sentences at the same time, it will be slow because it consumes memory, so it is divided into multiple sentences. --It seems that the CPU uses only half of the installed number.

It's about 5 times faster than the previous code.

Recommended Posts

Class when inferring with fairseq
When using optparse with iPython
When moss with pip install
Useful when debugging with TouchDesigner
Error when playing with python
When coverage fails with _sqlite3 error
Precautions when inheriting the DatasetMixin class
Precautions when installing tensorflow with anaconda
Transactions when operating PostgreSQL with Psycopg2
Arithmetic processing with Chinese numeral class
When matplotlib doesn't work with python2.7
[Python] Inherit a class with class variables
When using MeCab with virtualenv python
Precautions when using six with Python 2.5
[Python] Format when to_csv with pandas