[PYTHON] [Oracle ADS] How is regression and classification determined?

About ads available in Oracle Data Science Cloud.

Conclusion

Regression if the type of Series specified as target is float. Classification if int.

Decision logic

Package location

Let's take a look at the code of the ads package.


import sys

sys.path

['/tmp/dask-worker-space/worker-_4k3q1mv',
 '/home/datascience/conda/mlcpuv1/lib/python36.zip',
 '/home/datascience/conda/mlcpuv1/lib/python3.6',
 '/home/datascience/conda/mlcpuv1/lib/python3.6/lib-dynload',
 '',
 '/home/datascience/conda/mlcpuv1/lib/python3.6/site-packages', <=here
 '/home/datascience/conda/mlcpuv1/lib/python3.6/site-packages/IPython/extensions',
 '/home/datascience/.ipython']

Determined by the variable "ml_task_type".

ads/automl/driver.py


def get_ml_task_type(X, y, classes):
    target_type = TypeDiscoveryDriver().discover(y.name, y)
    if isinstance(target_type, DiscreteTypedFeature):
        if len(classes) == 2:
            if helper.is_text_data(X):
                ml_task_type = utils.ml_task_types.BINARY_TEXT_CLASSIFICATION
            else:
                ml_task_type = utils.ml_task_types.BINARY_CLASSIFICATION
        else:
            if helper.is_text_data(X):
                ml_task_type = utils.ml_task_types.MULTI_CLASS_TEXT_CLASSIFICATION
            else:
                ml_task_type = utils.ml_task_types.MULTI_CLASS_CLASSIFICATION
    elif isinstance(target_type, ContinuousTypedFeature):
        ml_task_type = utils.ml_task_types.REGRESSION
    else:
        raise TypeError("AutoML for target type ({0}) is not yet available"
                                   .format(target_type.meta_data["type"]))
    return ml_task_type

TypeDiscoveryDriver (). Determined by the type of discover (y.name, y)

from ads.type_discovery.type_discovery_driver import TypeDiscoveryDriver

ads/automl/type_discovery/type_discovery_driver.py

class TypeDiscoveryDriver:

    #
    # takes a pandas series
    #
    def discover(self, name, s, is_target=False):
     :
     :

        if is_target and ContinuousDetector._target_is_continuous(s):
            return ContinuousTypedFeature.build(name, s)

Regression if ContinuousDetector._target_is_continuous (s) is True

from ads.type_discovery.continuous_detector import ContinuousDetector

ads/automl/type_discovery/continuous_detector.py


class ContinuousDetector(AbstractTypeDiscoveryDetector):

    @staticmethod
    def _target_is_continuous(series):
        if str(series.dtype) in ['float16', 'float32', 'float64']:
            return True # treat target variable as continuous
        elif str(series.dtype) in ['int16', 'int32', 'int64']:
            if series.nunique() >= 20:
                return True # treat target variable as continuous

        return False

Recommended Posts

[Oracle ADS] How is regression and classification determined?
Difference between regression and classification
Classification and regression in machine learning
Memorandum about regression and binary classification metrics
How to use is and == in Python
Conformity and recall-Understanding how to evaluate classification performance ①-
What is pip and how do you use it?