[PYTHON] If you get angry with "too few updates ~" when running LdaModel

Somehow this ... It seems like you can play with passes or iterations I'm really scared because I don't know the contents such as parameter adjustment of numerical calculation.

Where the error is


model_lda = LdaModel(corpus=corpus, num_topics=30, id2word=corpus.id2word)
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy

Take a look at the source code The problem is the init last run update method

In the update method_Near line 616


if updates_per_pass * passes < 10:
    logger.warning("too few updates, training might not converge; consider "
                   "increasing the number of passes or iterations to improve accuracy")

passes uses the init parameter passes of LdaModel as it is. 1 is assigned by default. updates_per_pass ... Mmm ...

In the update method_Line 607


updates_per_pass = max(1, lencorpus / updateafter)

For lencorpus, the value of len (corpus) is assigned near line 585 of the update method. The point is the number of documents. The number of sentences when this warning is issued is 4019. updateafter...

In the update method_Around line 599


if update_every:
    updatetype = "online"
    updateafter = min(lencorpus, update_every * self.numworkers * chunksize)
else:
    updatetype = "batch"
    updateafter = lencorpus

If there is no argument specified for the update method, The same as the init parameter update_every is assigned to update_every. The initial value is 1. If you haven't done anything, the update type will be online. self.numworkers contains 1 if the init parameter distributed remains False.

chunksize is ...

In the update method_595 lines


chunksize = min(lencorpus, self.chunksize)

self.chunksize is the same as the init parameter chunksize. The default is 2000.

In other words ... updateafter = min(4019, 112000) = 2000 updates_per_pass = max(1, 4019 / 2000) ≒ 2 So, the evaluation formula on the left of if is 2 * 1. Out.

Measures ・ Increase passes. In this case, passes = 5 and you will not get angry. -Reduce updateafter = decrease update_every or chunksize. _ In this case, if you change only the chunk size, you will not get angry if you set it to about 400.

I'm tired of this parameter, so I'll look it up on another day.

Recommended Posts

If you get angry with "too few updates ~" when running LdaModel
[python] [vscode] When you get angry with space-tab-mixed
I get a UnicodeDecodeError when running with mod_wsgi
If you get lost with HTTP redirects 301 and 302
What to do if you get angry with swapon failed: Operation not permitted
What to do if you get an error when installing python with pyenv
If you get a long error when tabbing an interactive shell with Anaconda
If you get stuck when building pycocoapi on Windows
What to do if you get an OpenSSL error when installing Python 2 with pyenv
What to do if you get an Import Error when importing matplotlib with Jupyter
What to do if you get an error when running "certbot renew" in CakePHP environment
What to do if you get an Undefined error when trying to use pip with pyenv
What to do if you get angry if you don't have libxml / xmlversion.h when installing lxml on CentOS
Read this if you get SSL related errors with pip install! !! !!
What to do if you get a "Wrong Python Platform" warning when using Python with the NetBeans IDE
What to do if you get angry with "Value Error: unknown local: UTF-8" in python manage.py syncdb
What to do when you get angry that libxml / xmlversion.h does not exist when you put lxml with pip
I want to get angry with my mom when my memory is tight
What to do if you get lost in file reference with FileNotFoundError
What to do if you get angry in TensorFlow v2 without attribute'app'
If you want to get multiple statistics with groupby in pandas v1
What to do if you get a TypeError with numpy min, max
What to do if you get an error when trying to load mnist
What to do if you get Could not fetch URL 443 with pip
What to do if you get an error when installing Dlib (Ubuntu)