[PYTHON] DBSCAN (clustering) with scikit-learn

About the outline of DBSCAN which is one of the clustering algorithms and simple parameter tuning I didn't seem to have a complete Japanese article, so I made a note of it. Please note that the outline of DBSCAN is a (rough) Japanese translation of wikipedia.

What is DBSCAN

Algorithm overview

[From figure wikipedia]

Pros

Disadvantages

Difference from other algorithms

It is a comparison of each method in scikit-learn demo page, but the second from the right is DBSCAN. You can intuitively see that they are clustered based on density. sphx_glr_plot_cluster_comparison_001.png

Tuning of ε and minPts

If it is two-dimensional, it can be visualized to determine whether it is classified well, but if it is three-dimensional or more, it is difficult to visualize and judge. I debugged and adjusted the outliers and the number of clusters as follows. (Using scikit-learn)

from sklearn.cluster import DBSCAN

for eps in range(0.1,3,0.1):
    for minPts in range(1,20):
        dbscan = DBSCAN(eps=eps,min_samples=minPts).fit(X)
        y_dbscan = dbscan.labels_
        print("eps:",eps,",minPts:", minPts)
        #Outlier number
        print(len(np.where(y_dbscan ==-1)[0]))
        #Number of clusters
        print(np.max(y_dbscan)))
        #Number of points in cluster 1
        print(len(np.where(y_dbscan ==0)[0]))
        #Number of points in cluster 2
        print(len(np.where(y_dbscan ==1)[0]))



DBSCAN related links

Postscript

Japanese Wikipedia has been updated with additional descriptions. It's easy to understand.

Recommended Posts

Clustering with scikit-learn + DBSCAN
DBSCAN (clustering) with scikit-learn
DBSCAN with scikit-learn
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
Photo segmentation and clustering with DBSCAN
Isomap with Scikit-learn
Clustering with python-louvain
kmeans ++ with scikit-learn
The most basic clustering analysis with scikit-learn
Cross Validation with scikit-learn
Learn with chemoinformatics scikit-learn
DBSCAN algorithm (data clustering)
Install scikit.learn with pip
Calculate tf-idf with scikit-learn
Try using scikit-learn (1) --K-means clustering
Neural network with Python (scikit-learn)
I tried clustering with PyCaret
Clustering ID-POS data with LDA
Parallel processing with Parallel of scikit-learn
[Python] Linear regression with scikit-learn
Deep Embedded Clustering with Chainer 2.0
Robust linear regression with scikit-learn
Perform (Visualization> Clustering> Feature Description) with (t-SNE, DBSCAN, Decision Tree)
Grid search of hyperparameters with Scikit-learn
Creating a decision tree with scikit-learn
Image segmentation with scikit-image and scikit-learn
Identify outliers with RandomForestClassifier in scikit-learn
Laplacian eigenmaps with Scikit-learn (personal notes)
Non-negative Matrix Factorization (NMF) with scikit-learn
Try machine learning with scikit-learn SVM
Scikit-learn DecisionTreeClassifier with datetime type values
100 language processing knock-97 (using scikit-learn): k-means clustering
[Scikit-learn] I played with the ROC curve
Try SVM with scikit-learn on Jupyter Notebook
Multi-label classification by random forest with scikit-learn
[Python] Use string data with scikit-learn SVM
Implement a minimal self-made estimator with scikit-learn
Fill in missing values with Scikit-learn impute
Clustering books from Aozora Bunko with Doc2Vec
Visualize scikit-learn decision trees with Plotly's Treemap