I will make a note of the knowledge I gained when doing machine learning and data mining with VS Code in Python 3.
I know it's hard to read.
Updated from time to time.
Python3 + venv + VSCode + macOS development environment construction --Qiita
Jupyter-notebook drawing library comparison-Qiita
Use ipywidgets and Bokeh for interactive visualization-Qiita
When I try to do ipywidgets with VSCode's jupyter extension, I can't use it because I can't read around the script. support for ipython/jupyter widgets · Issue #21 · DonJayamanne/vscodeJupyter Let's do it quietly with the browser Jupyter
https://github.com/bokeh/bokeh/blob/master/examples/howto/notebook_comms/Jupyter%20Interactors.ipynb
"python.linting.pylintArgs": [
"--extension-pkg-whitelist=numpy"
]
No-member error in Pylint-Qiita
ValueError: n_samples = 1 should be> = n_clusters = 3
appears when k-means is performed.Since the required data needs to be two-dimensional, it is appending and inefficient in this blog, so it is good to do something like sample_data.iloc [:, 0: 1]
.
This can extract the first column, which is the same as when sample_data.iloc [:, 0] is set, but it seems that it will be represented in two dimensions by setting 0: 1, and the above error Will not come out
Day 6 until understanding machine learning / clustering-IT captain's blog
df.append (df2)
with df = pd.DataFrame ()
does not add to df.Should be df = df.append (df2)
python - Appending to an empty data frame in Pandas? - Stack Overflow
However, Type Hints seems to be synonymous with just a comment, so if you pass an object of the wrong type, linter will not get angry and will not be type checked until you run it.
Typed world starting with Python-Qiita
Python class member scope summary-Qiita
Python pandas data iteration and function application, pipe --StatsFragments
List index (enumerate)-Learning site from Python introductory to application
Pandas: Converting to numeric, creating NaNs when necessary
Easy Python package management with pip related tools-Qiita
Append when you want to simply join vertically, join when you want to join horizontally
Python pandas data concatenation / join processing as seen in the figure --StatsFragments
pd.set_option("display.max_rows", 10)
Prevent pandas from omitting display-problems and solution notes at work.
[[Python] Sort # Sort multidimensional list](http://qiita.com/fantm21/items/6df776d99356ef6d14d4 #Sort multidimensional list)
Summary of Python sort (list, dictionary type, Series, DataFrame) --Qiita
code-python-isort - Visual Studio Marketplace
%sql select * from hoge Extension of jupyter that can be plunged into DataFrame etc. just by writing ipython-sql
I made a tool to convert Jupyter py to ipynb with VS Code --Qiita
tttt = pd.DataFrame()
tttt.append(None)
tttt = df[["label"]]
tttt.append(None)
This is because you don't know the type of the argument, so if you specify the type after df [[“label ”]]
using ʻassert is instance` or something, append will appear in IntelliSense.
How to write Python to get IntelliSense to work --Ajobuji Hoshi Tsushin
Python pandas accessor / Grouper with a little more advanced grouping / aggregation --StatsFragments
You can also group time-series data every 1 second or every day.
.replace ("hoge", "toHoge")
,You can also use regular expressions like .replace (". *", "+1", regex = True)
from sklearn.metrics import confusion_matrix
test_label_lb = [] #Correct label
p_label = [] #Estimated label
cmx_data = confusion_matrix(y_true=test_label_lb, y_pred=p_label)
labels = ["A", "B", "C"]
df_cmx = pd.DataFrame(cmx_data, index=labels, columns=labels)
import folium
m = folium.Map(location=[33.763, -84.392], zoom_start=17)
folium.Marker(
location=[33.763006, -84.392912],
popup='World of Coca-Cola'
).add_to(m)
m
How to use map / filter in Python3 --- A story that seems to go somewhere
Mastering the Python pandas plot function-StatsFragments
Iterator advances when the contents are taken out by list () etc.
num_map = map(lambda n: n + 1, np.random.random(1000))
print(list(num_map)) #Here is the value
num_filter = filter(lambda n: n > 0.5, np.random.random(1000))
print(list(num_filter)) #Here is the value
print(list(num_map)) #Not here anymore
print(list(num_filter)) #Not here anymore
max(dic, key=lambda i: dic[i])
If you have Python 3.4 or later, you should throw away os.path and use pathlib
from pathlib import Path
LOG_DIR = "/Users/your_name/log"
Path(LOG_DIR).joinpath("log.json") #Or Path(LOG_DIR) / "log.json"
# PosixPath('/Users/your_name/log/log.json')Becomes
Path(LOG_DIR).joinpath("log.json").exists()
# False
How to do multi-core parallel processing with python
It's easy because you can pass it appropriately in the range
Python visualization tools may be standardized by HoloViews Basic graph of HoloViews in one liner
Show progress bar in Python (tqdm)
If you pass an iterable object, you can see how many iterates per second you are progressing, so it is a good guide.
bbox_inches = "tight" or something like that
If you make the font big or make a landscape or portrait graph, the label may stick out with savefig, so if you do .savefig ("test.png ", bbox_inches = "tight")
, it will come out beautifully.
Jupyter Notebook>% timeit range (100)> Measurement of processing time> %% timeit> Measurement of processing time of multiple sentences Story of measuring code execution time with IPython
With Jupyter, you can get the execution time of func with % time func ()
, but it is rather blurry
If you set % timeit func ()
, it will be executed several times and measured.
VS Code's jupyter extension doesn't recognize %% timeit
, so if you want to evaluate multiple lines, VS Code's Jupyter seems impossible ( Well, it should be a function)
Is there NaN in the pandas DataFrame?
df.isnull (). values.any ()
is easy to remember and fast, so it's good, but it depends on the type, so give it a try.
Three tips for maintaining Python pandas performance
Slow auto complete speed for custom modules python #903 Slow autocompletion/formatting #581
If you add the following to VSCode settings.json, it will be preloaded.
"python.autoComplete.preloadModules": [
"pandas",
"numpy",
"matplotlib"
]
As a result, I feel that suggestions such as pandas.DataFrame ()
are faster, but I feel that it does not change when type inference is required.
It will be faster if you specify it with ʻassert is instance`, but you can not do it one by one ...
df = func_something()
df.sum() #Sum comes out slowly here
assert isinstance(df, pd.DataFrame)
df.sum() #Here sum comes out soon
When zombies are used when using multiprocessing in IPython
#p = Pool()
p.terminate()
Explicitly kill or
with Pool() as p:
results = p.map(func, range(0, 100))
Use with
If you find a match in list_prefix
that matches the prefix in list_ab
(although this example isn't very good ...)
list_ab = ["aa_a", "aa_b", "ab_a", "ab_b", "ba_a", "ba_b"]
list_prefix = ["aa", "ab"]
print(list(
filter(lambda a: True in map(lambda b: a.startswith(b), list_prefix),
list_ab)
)) # ['aa_a', 'aa_b', 'ab_a', 'ab_b']
With this, ʻa` gets angry at E0602 (but since pylint just gets angry, it can be executed and the result is as expected).
from itertools import compress
print(list(
compress(list_ab,
[True in [a.startswith(b) for b in list_prefix] for a in list_ab]
)
)) # ['aa_a', 'aa_b', 'ab_a', 'ab_b']
It is good to write in list comprehension notation using compress.
In summary
--Let's stop multiprocessing --Let's use GC well --Let's make it a numpy array --Let's make it 32bit --Let's make a destructive assignment with for (cython if slow) --Let's compress data (practicality is subtle?) --Let's physically increase memory
The effect is weak because the compression does not work so much in terms of data, but it is getting smaller. Since it is compressed, the speed to export is naturally slower than picke
When compless = 0, it is uncompressed, so it will be almost the same size as when it was put out with pickle, but joblib is easier because there is no need to write with open in dump and load.
import os
import pickle
import joblib
import numpy as np
import pandas as pd
dump_data = np.random.randn(10000000)
with open("dump_data.pkl", "wb") as f:
pickle.dump(dump_data, f)
print(os.path.getsize("dump_data.pkl") / 1024 / 1024, "MB")
# 76.29409885406494 MB
joblib.dump(dump_data, "dump_data", compress=3)
print(os.path.getsize("dump_data") / 1024 / 1024, "MB")
# 73.5648946762085 MB
# joblib.load("dump_data") #Read
[Explanation of all Seaborn methods (Part 1: Graph list)](http://own-search-and-study.xyz/2017/05/02/ Explanation of all seaborn methods (Part 1: Graph list) / ) Data visualization with Python-let's draw a cool heat map Beautiful graph drawing with python -seaborn makes data analysis and visualization easier Part 1
I feel that it often happens related to matplotlib and seaborn, but there are cases where pip-compile cannot be performed due to an error such as egg_info.
In that case, I think that pip-compile --rebuild
will work.
Reference: https://github.com/jazzband/pip-tools/issues/586
Summary of how to import files in Python 3
Is it best to create and read __init__.py
?
Very convenient
When HTML (html_code)
and ʻinit_notebook_mode () are executed at once in the same Cell, they are not displayed. So, if you first execute only
HTML (html_code) and then execute ʻinit_notebook_mode ()
, it will work (once it can be displayed, it is okay to execute it on the same Cell at once)
Because JS loading is asynchronous?
Recommended Posts