[PYTHON] Memorandum of understanding for environment construction of AutoML library PyCaret

Install pandas-profiling to avoid errors

Computer environment

OS:Ubuntu 18.04LTS

Anaconda3 Creating a virtual environment

bash


$ conda create -n pycaret python=3.6.10

Install pyCaret under conda virtual environment

  1. Install with pip according to Manual

bash


$ conda activate pycaret
(pycaret)$ pip install pycaret
(pycaret)$ python -m ipykernel install --user --name pycaret --display-name "display-name-here"

However, after installing it recently, when I execute the following command with jupyter notebook, it starts to throw an error.

python


from pycaret.datasets import get_data
dataset = get_data('credit', profile=True)

This is a command to download from PyCaret's data respository with get_data, and the original tutorial didn't give the argument profile = True. In other words, it is executed with the default argument profile = False. * In this case, only the first 5 lines of data are displayed *.

On the other hand, if you give the argument profile = True, it will be output in the format of pandas profiling report. You can check the basic statistics and correlation coefficient of DataFrame all at once, but you don't have to bother with ʻimport pandas_profiling`.

However, if I installed using pip install pycaret at different times, I got an error with profile = True, probably because the subversions of some packages were different, so requirements.txt I'm installing using.

  1. ** Place the requirements.txt file separately in the directory where the virtual environment is started, and install it with pip **

bash


$ conda activate pycaret
(pycaret)$ pip install -r requirements.txt
(pycaret)$ python -m ipykernel install --user --name pycaret --display-name "display-name-here"

Describe the following in requirements.txt.

astropy==4.0.1.post1
attrs==19.3.0
awscli==1.18.64
backcall==0.1.0
bleach==3.1.5
blis==0.4.1
boto==2.49.0
boto3==1.13.14
botocore==1.16.14
catalogue==1.0.0
catboost==0.20.2
certifi==2020.4.5.1
chardet==3.0.4
chart-studio==1.1.0
click==7.1.2
colorama==0.4.3
colorlover==0.3.0
combo==0.1.0
confuse==1.1.0
cufflinks==0.17.0
cycler==0.10.0
cymem==2.0.3
datefinder==0.7.0
DateTime==4.3
decorator==4.4.2
defusedxml==0.6.0
docutils==0.15.2
entrypoints==0.3
funcy==1.14
future==0.18.2
gensim==3.8.3
graphviz==0.14
htmlmin==0.1.12
idna==2.9
importlib-metadata==1.6.0
ipykernel==5.3.0
ipython==7.14.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.17.0
Jinja2==2.11.2
jmespath==0.10.0
joblib==0.15.1
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
kiwisolver==1.2.0
kmodes==0.10.1
lightgbm==2.3.1
llvmlite==0.32.1
MarkupSafe==1.1.1
matplotlib==3.2.1
missingno==0.4.2
mistune==0.8.4
mlxtend==0.17.2
more-itertools==8.3.0
murmurhash==1.0.2
nbconvert==5.6.1
nbformat==5.0.6
nltk==3.5
notebook==6.0.3
numba==0.49.1
numexpr==2.7.1
numpy==1.18.4
packaging==20.4
pandas==1.0.3
pandas-profiling==2.3.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
phik==0.9.12
pickleshare==0.7.5
Pillow==7.1.2
plac==1.1.3
plotly==4.4.1
pluggy==0.13.1
preshed==3.0.2
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
py==1.8.1
pyasn1==0.4.8
pycaret==1.0.0
Pygments==2.6.1
pyLDAvis==2.1.2
pyod==0.7.9
pyparsing==2.4.7
pyrsistent==0.16.0
pytest==5.4.2
python-dateutil==2.8.1
pytz==2020.1
PyYAML==5.3.1
pyzmq==19.0.1
regex==2020.5.14
requests==2.23.0
retrying==1.3.3
rsa==3.4.2
s3transfer==0.3.3
scikit-learn==0.22
scipy==1.4.1
seaborn==0.10.1
Send2Trash==1.5.0
shap==0.32.1
six==1.14.0
smart-open==2.0.0
spacy==2.2.4
srsly==1.0.2
suod==0.0.4
tbb==2020.0.133
terminado==0.8.3
testpath==0.4.4
textblob==0.15.3
thinc==7.4.0
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
umap-learn==0.4.3
urllib3==1.25.9
wasabi==0.6.0
wcwidth==0.1.9
webencodings==0.5.1
widgetsnbextension==3.5.1
wordcloud==1.7.0
xgboost==0.90
yellowbrick==1.0.1
zipp==3.1.0
zope.interface==5.1.0
  1. If you want to read the data with pandas and output the pandas profiling report, do the following.

python


import pandas as pd
import numpy as np

df = pd.read_csv('/path/to/data.csv',sep=",", encoding="utf-8")

import pandas_profiling

pandas_profiling.ProfileReport(df)

Recommended Posts

Memorandum of understanding for environment construction of AutoML library PyCaret
[Django] Memorandum of environment construction procedure
Construction of development environment for Choreonoid class
elasticsearch_dsl Memorandum of Understanding
[Memo] Construction of cygwin environment
Python environment construction For Mac
Python3 environment construction (for beginners)
Environment construction of python2 & 3 (OSX)
Ansible environment construction For Mac
Construction of Cortex-M development environment for TOPPERS using Raspberry Pi
A memorandum of understanding for the Python package management tool ez_setup
Environment construction of python and opencv
Start of self-made OS 1. Environment construction
[For beginners] Django -Development environment construction-
Environment construction memo of pyenv + conda
Environment construction of python3.8 on mac
Python3 TensorFlow for Mac environment construction
Environment construction of "Tello_Video" on Ubuntu
Easy-to-understand explanation of Python Web application (Django) even for beginners (1) [Environment construction]
Python project environment construction procedure (for windows)
Vue.js + Flask environment construction memorandum ~ with Anaconda3 ~
A memorandum of understanding about django's QueryDict
Memorandum of Understanding when migrating with GORM
Environment construction for MXNet tutorial (gluon part)
Ubuntu 16.04 LTS, beginner memorandum of environment construction to switch anaconda version with pyenv
Cloud9 environment construction for developing serverless web applications
Poetry-virtualenv environment construction with python of centos-sclo-rh ~ Notes
Easy understanding of Python for & arrays (for super beginners)
Pillow environment construction --For Docker + iPython (and OpenCV)
Basics of pandas for beginners ② Understanding data overview