[PYTHON] DBSCAN with scikit-learn

DBSCAN implemented in scikit-learn )

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import numpy as np
from sklearn import cluster

"""Specify parameters"""
dbscan = cluster.DBSCAN(eps=float(sys.argv[1]), min_samples=int(sys.argv[2]))

"""Read data"""
data_list = []
for line in open(sys.argv[3]):
    x = map(float, line.rstrip().split(' '))
    data_list.append(x)
data = np.array(data_list)

"""Clustering"""
dbscan.fit(data)

"""View results"""
labels = dbscan.labels_
for i in range(len(labels)):
    if labels[i] != -1:
        print labels[i], data[i]

How to use

Prepare the following file that describes the sample data in the row and the value of the attribute to be written in the column.

0 1
8.5 6
2 0
1.5 0
1 1.5
10 5
9 6
8 5.5
9.5 5.6
100 100
-100 -50
1 0

Execute by passing eps, min_samples, data_file in this order as arguments

>> python dbscan.py 1.5 3 data
0.0 [ 0.  1.]
1.0 [ 8.5  6. ]
0.0 [ 2.  0.]
0.0 [ 1.5  0. ]
0.0 [ 1.   1.5]
1.0 [ 10.   5.]
1.0 [ 9.  6.]
1.0 [ 8.   5.5]
1.0 [ 9.5  5.6]
0.0 [ 1.  0.]

dbscan.labels_ shows which cluster each sample was assigned to. When it is -1, it means that the noise cannot be assigned to any cluster.

Recommended Posts

DBSCAN with scikit-learn
Clustering with scikit-learn + DBSCAN
DBSCAN (clustering) with scikit-learn
Isomap with Scikit-learn
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
PCA with Scikit-learn
kmeans ++ with scikit-learn
Multi-class SVM with scikit-learn
Install scikit.learn with pip
Calculate tf-idf with scikit-learn
Neural network with Python (scikit-learn)
Parallel processing with Parallel of scikit-learn
[Python] Linear regression with scikit-learn
Robust linear regression with scikit-learn
Grid search of hyperparameters with Scikit-learn
Creating a decision tree with scikit-learn
Image segmentation with scikit-image and scikit-learn
Photo segmentation and clustering with DBSCAN
Identify outliers with RandomForestClassifier in scikit-learn
Non-negative Matrix Factorization (NMF) with scikit-learn
Try machine learning with scikit-learn SVM
Scikit-learn DecisionTreeClassifier with datetime type values
The most basic clustering analysis with scikit-learn
Let's tune the model hyperparameters with scikit-learn!
Revisited scikit-learn
[Scikit-learn] I played with the ROC curve
Try SVM with scikit-learn on Jupyter Notebook
Multi-label classification by random forest with scikit-learn
Clustering representative schools in summer 2016 with scikit-learn
Implement a minimal self-made estimator with scikit-learn
Fill in missing values with Scikit-learn impute
Visualize scikit-learn decision trees with Plotly's Treemap
I tried handwriting recognition of runes with scikit-learn
Predict the second round of summer 2016 with scikit-learn
Multivariable regression model with scikit-learn --SVR comparison verification