[PYTHON] LightGBM All parameter explanation (on the way)

Contents

I will explain all the parameters of lightGBM roughly. Since there is a lot of content, I will slowly translate it over several days. I will update the details in a separate article from time to time. If you make a mistake, I would appreciate it if you could point it out. The official github of lightGBM is here

The basic description format is default = default, type = type, options = options, constraints = constraints

Core Parameters

-- config, default = "", type = string, alias: config_file

--Configuration file path

-** Note **: Only available in CLI version

-- task, default = train, type = enum, options: train, predict, convert_model, refit, alias: task_type

-- train, also known as: training

-- predict, alias: prediction, test

-- convert_model, Convert the model file to if-else format. For more information, see IO Parameters

-- refit, refit with new data, alias: refit_tree

-** Note **: Only available in CLI version; Language-specified packages provide corresponding functionality.

--Regression

-- regression, L2 loss, aliases: regression_l2, l2, mean_squared_error, mse, l2_root, root_mean_squared_error, rmse

-- regression_l1, L1 loss, aliases: l1, mean_absolute_error, mae

  -  ``huber``, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss>)

  -  ``fair``, [Fair loss](https://www.kaggle.com/c/allstate-claims-severity/discussion/24520)

  -  ``poisson``, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression)

  -  ``quantile``, [Quantile regression](https://en.wikipedia.org/wiki/Quantile_regression)

  -  ``mape``, [MAPE loss](https://en.wikipedia.org/wiki/Mean_absolute_percentage_error): ``mean_absolute_percentage_error``

-- gamma, Gamma regression with log-link. Example of use: Modeling the frequency of insurance coverage and other cases that follow the gamma distribution. gamma-distributed

-- tweedie, Tweedie regression with log-link. Example of use: Modeling total loss of insurance and other cases following tweedie distribution [tweedie-distributed](https://en.wikipedia.org/wiki/ Tweedie_distribution # Occurrence_and_applications)

--Binary classification

-- binary, binary log loss (or logistic regression)

--Label must be 0 or 1; [0,1] See Cross Entropy (https://en.wikipedia.org/wiki/Cross_entropy) for general label probabilities.

--Other classification

-- multiclass, Softmax, Alias: softmax

-- multiclassova, One-vs-All, Alias: multiclass_ova, ʻova`` , ʻovr``

  -  ``num_class`` should be set as well

--Cross entropy application

-- cross_entropy, objective function of cross entropy (weight is arbitrary), alias: xentropy

-- cross_entropy_lambda, other parameterization of cross entropy, alias: xentlambda

  -  label is anything in interval [0, 1]

--Ranking application

-- lambdarank, lambdarank. label_gain (definition book) (Explanation from page onwards) has an integer label and weights any value of the label so that it is less than the number of elements in label_gain.

-- rank_xendcg, XE_NDCG_MART Rank objective function, alias: xendcg, xe_ndcg, xe_ndcg_mart, xendcg_mart

-- rank_xendcg The calculation is fast and the behavior is similar to lambdarank.

--Labels need to be of type `ʻint`` and large numbers need to have better meaning. (Example: 0: bad, 1: normal, 2: good, 3: pretty good)

-- boosting, default = gbdt, type = enum, options: gbdt, rf, dart, goss, alias: boosting_type, boost

-- gbdt, typical gradient boosting, also known as: gbrt

-- rf, random tree, alias: random_forest

-- data, default = "", type = string, aliases: train, train_data, train_data_file, data_filename

--If you specify the training data path and path, LightGBM will train using that data.

-** Note **: CLI version only available

-- valid, default = "", type = string, aliases: test, valid_data, valid_data_file, test_data, test_data_file` `,` `valid_filenames

--Validation / test data path, LightGBM tries to output the result using these data.

--Multiple validation data can be used, separated by , .

-** Note **: CLI version only available

-- num_iterations, default = 100, type = int, aliases: num_iteration, n_iter, num_tree, num_trees, num_round , num_rounds, num_boost_round, n_estimators, Constraints: num_iterations> = 0

--Number of boosting

-** Note **: Internally, LightGBM builds num_class * num_iterations trees in other classification problems.

-- learning_rate, default = 0.1, type = double, aliases: shrinkage_rate, ```eta, constraints: learning_rate> 0.0``

--Shrinkage measure

-- dart affects the normalized weights of dropped trees.

-- num_leaves, default = 31, type = int, aliases: num_leaf, max_leaves, max_leaf, constraints: 1 <num_leaves <= 131072`

--Maximum number of leaves in one tree.

-- tree_learner, default = serial, type = enum, options: serial, feature, data, voting, alias: tree, tree_type, tree_learner_type --Specify how to learn trees. Since the terminology is specialized, translation is omitted.

-- feature, feature parallel tree learner, alias: feature_parallel

-- data, data parallel tree learner, alias: data_parallel

-- voting, voting parallel tree learner, alias: voting_parallel

--Please refer to Parallel Learning.

--Number of threads used for LightGBM

--In OpenMP, 0 means the default number of threads.

--In order to maximize the calculation speed, this parameter should be set to ** the actual number of CPU cores **, not the number of threads, so please be careful. (Most CPUs use hyper-threading to spawn 2 threads per CPU.)

--If the dataset is small, do not make it large. (For example, do not use 64 threads for 10000 columns of data.)

--Task Manager and other CPU monitoring tools may show that all cores are not in use. ** This is normal **

--In parallel processing, do not use the entire number of CPU cores so as not to reduce network performance.

-** Note **: Do not change this parameter ** during training **. Unexpected errors can occur, especially if you are running multiple tasks at the same time in an external package.

--Specify the device used for learning trees. You can speed up by using GPU.

-** Note **: You can speed up by using a smaller max_bin (Example 63).

-** Note **: By default, the GPU is added to 32-bit floating point for faster speed. This can affect the accuracy of some tasks and can be changed to 64-bit floating point by setting gpu_use_dp = true, but it can take more time to train ..

-** Note **: If you want to use GPU with lightGBM Installation Guide Please refer to.

--This seed will generate other seeds. Example. data_random_seed, feature_fraction_seed, etc.

--By default, this seed is not used due to the default values of other seeds.

--This seed has a lower priority than other seeds. That is, if you explicitly specify another seed, this seed will be overwritten.

Learning Control Parameters

--Only cpu can be used --By setting this to true, you can generate a column-based histogram.

--It is recommended to apply this parameter in the following cases:

--Large number of columns or large number of bins

-- num_threads is large, eg > 20

--I want to keep memory costs down

-** Note **: When both force_col_wise and force_row_wise are false, LightGBM will try both first and use the faster one. Test set memory cost ( To get rid of overhead), manually set the faster one to true.

-** Note **: Cannot be used with force_row_wise, please choose only one of the two.

--Only cpu can be used

--By setting this to true, you can generate a row-based histogram.

--It is recommended to apply this parameter in the following cases:

--Large amount of data or relatively small number of bins

--Relatively few num_threads, eg <= 16

--When you want to speed up using a small value bagging_fraction or goss

-** Note **: Setting this to true doubles the memory usage for the dataset. If you don't have enough memory, use force_col_wise = true.

-** Note **: When both force_col_wise and force_row_wise are false, LightGBM will try both first and use the faster one. Test set memory cost ( To get rid of overhead), manually set the faster one to true.

-** Note **: Cannot be used with force_col_wise, please choose only one of the two.

--Maximum cache size of historical histogram (MB unit)

-- <0 means unlimited

--Limit the maximum depth of the tree model. This is used to deal with overfitting when the number of data is small. The specifications of the wood do not change.

-- <= 0 means unlimited.

-- min_data_in_leaf, default = 20, type = int, aliases: min_data_per_leaf, min_data, min_child_samples, constraints: min_data_in_leaf> = 0

--Minimum number of data for one leaf. Used to deal with overfitting.

--Minimum sum of Hessian in one leaf. Similar to min_data_in_leaf, it is used to deal with overfitting.

-- bagging_fraction, default = 1.0, type = double, aliases: sub_row, subsample, bagging, constraints: 0.0 <bagging_fraction <= 1.0`

--Similar to feature_fraction, but this randomly extracts a subset of the data without resampling.

--Used to improve the calculation speed of training.

--Used to deal with overfitting.

-** Note **: bagging_freq must also be a non-zero value for bagging to take effect.

-- pos_bagging_fraction, default = 1.0, type = double, aliases: pos_sub_row, pos_subsample, pos_bagging, constraints: `` 0.0 <pos_bagging_fraction <= 1.0

--Use only with binary.

--Used for unbalanced binary classification problems. Randomly extract #pos_samples * pos_bagging_fraction positive samples during bagging.

--Must be used with neg_bagging_fraction.

--If you set it to 1.0, it will be invalid.

-** Note **: You need to fill in bagging_freq and neg_bagging_fraction for it to take effect.

-** Note **: If both pos_bagging_fraction and neg_bagging_fraction are 1.0, balanced bagging is disabled.

-** Note **: If balanced bagging is enabled, bagging_fraction is ignored.

-- neg_bagging_fraction, default = 1.0, type = double, aliases: neg_sub_row, neg_subsample, neg_bagging, constraints: 0.0 <neg_bagging_fraction <= 1.0`

--Use only with binary.

--Used for unbalanced binary classification problems. Randomly extract #neg_samples * neg_bagging_fraction negative samples during bagging.

--Use with pos_bagging_fraction.

--If you set it to 1.0, it will be invalid.

-** Note **: You need to fill in bagging_freq and neg_bagging_fraction for it to take effect.

-** Note **: If both pos_bagging_fraction and neg_bagging_fraction are 1.0, balanced bagging is disabled.

-** Note **: If balanced bagging is enabled, bagging_fraction is ignored.

-- bagging_freq, default = 0, type = int, alias: subsample_freq

--Bagging frequency

-- 0 means no bagging. K means that it is repeatedly bagged once every k.

-** Note **: The value of bagging_fraction must be less than 1.0 for bagging to take effect.

-- bagging_seed, default = 3, type = int, alias: bagging_fraction_seed

--Random seed of bagging

-- feature_fraction, default = 1.0, type = double, aliases: sub_feature, colsample_bytree, constraints: 0.0 <feature_fraction <= 1.0

--If feature_fraction is less than 1.0, LightGBM will randomly extract some features each time. For example, with 0.8, LightGBM will select 80% of the features before training.

--Can be used to speed up training.

--Can be used as a countermeasure against overfitting.

-- feature_fraction_bynode, default = 1.0, type = double, aliases: sub_feature_bynode, colsample_bynode, constraints: 0.0 <feature_fraction_bynode <= 1.0

--If feature_fraction_bynode is less than 1.0, LightGBM will partially extract the features at each tree node. For example, with 0.8, LightGBM will extract 80% of the features from each tree node.

--Can be used as a countermeasure against overfitting.

-** Note **: Unlike feature_fraction, training is not accelerated.

-** Note **: If both feature_fraction and feature_fraction_bynode are less than 1.0, the final percentage of each node will be double the original feature_fraction * feature_fraction_bynode. ..

--Random seed of feature_fraction

--Used for extremely random trees.

--If true, lightGBM will select only one random threshold for each feature when evaluating node splits.

--Used as a countermeasure for overfitting.

--Random seed to use for threshold selection when `ʻextra_trees`` is true

--ʻearly_stopping_round``, default = `` 0``, type = int, aliases: ʻearly_stopping_rounds, `ʻearly_stopping, n_iter_no_change``

--At the last round of `ʻearly_stopping_round``, stop training if performance does not improve.

-- <= 0 means invalid.

--If you want to use only the first evaluation of early stopping, set this to true.

-- max_delta_step, default = 0.0, type = double, aliases: max_tree_output, max_leaf_output

--Limit the maximum number of leaves output.

-- <= 0 means unlimited.

--The maximum number of final leaves output is learning_rate * max_delta_step.

-- lambda_l1, default = 0.0, type = double, alias: reg_alpha, limit: lambda_l1> = 0.0

--L1 regularization

-- lambda_l2, default = 0.0, type = double, aliases: reg_lambda, lambda, restrictions: lambda_l2> = 0.0

--L2 regularization

-- min_gain_to_split, default = 0.0, type = double, alias: min_split_gain, limit: min_gain_to_split> = 0.0

--Minimum gain when splitting (gain)

-- drop_rate, default = 0.1, type = double, alias: rate_drop, constraint: 0.0 <= drop_rate <= 1.0

--Used only for dart.

--dropout rate: Dropouts are used to weaken the random portion of the feature during training. to mute a random fraction of the input features during the training phase. References)

IO Parameters

Dataset Parameters


-  ``max_bin`` , default = ``255``, type = int, constraints: ``max_bin > 1``

   -  max number of bins that feature values will be bucketed in

   -  small number of bins may reduce training accuracy but may increase general power (deal with over-fitting)

   -  LightGBM will auto compress memory according to ``max_bin``. For example, LightGBM will use ``uint8_t`` for feature value if ``max_bin=255``

-  ``max_bin_by_feature`` , default = ``None``, type = multi-int

   -  max number of bins for each feature

   -  if not specified, will use ``max_bin`` for all features

-  ``min_data_in_bin`` , default = ``3``, type = int, constraints: ``min_data_in_bin > 0``

   -  minimal number of data inside one bin

   -  use this to avoid one-data-one-bin (potential over-fitting)

-  ``bin_construct_sample_cnt`` , default = ``200000``, type = int, aliases: ``subsample_for_bin``, constraints: ``bin_construct_sample_cnt > 0``

   -  number of data that sampled to construct histogram bins

   -  setting this to larger value will give better training result, but will increase data loading time

   -  set this to larger value if data is very sparse

-  ``data_random_seed`` , default = ``1``, type = int, aliases: ``data_seed``

   -  random seed for sampling data to construct histogram bins

-  ``is_enable_sparse`` , default = ``true``, type = bool, aliases: ``is_sparse``, ``enable_sparse``, ``sparse``

   -  used to enable/disable sparse optimization

-  ``enable_bundle`` , default = ``true``, type = bool, aliases: ``is_enable_bundle``, ``bundle``

   -  set this to ``false`` to disable Exclusive Feature Bundling (EFB), which is described in `LightGBM: A Highly Efficient Gradient Boosting Decision Tree <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`__

   -  **Note**: disabling this may cause the slow training speed for sparse datasets

-  ``use_missing`` , default = ``true``, type = bool

   -  set this to ``false`` to disable the special handle of missing value

-  ``zero_as_missing`` , default = ``false``, type = bool

   -  set this to ``true`` to treat all zero as missing values (including the unshown values in LibSVM / sparse matrices)

   -  set this to ``false`` to use ``na`` for representing missing values

-  ``feature_pre_filter`` , default = ``true``, type = bool

   -  set this to ``true`` to pre-filter the unsplittable features by ``min_data_in_leaf``

   -  as dataset object is initialized only once and cannot be changed after that, you may need to set this to ``false`` when searching parameters with ``min_data_in_leaf``, otherwise features are filtered by ``min_data_in_leaf`` firstly if you don't reconstruct dataset object

   -  **Note**: setting this to ``false`` may slow down the training

-  ``pre_partition`` , default = ``false``, type = bool, aliases: ``is_pre_partition``

   -  used for parallel learning (excluding the ``feature_parallel`` mode)

   -  ``true`` if training data are pre-partitioned, and different machines use different partitions

-  ``two_round`` , default = ``false``, type = bool, aliases: ``two_round_loading``, ``use_two_round_loading``

   -  set this to ``true`` if data file is too big to fit in memory

   -  by default, LightGBM will map data file to memory and load features from memory. This will provide faster data loading speed, but may cause run out of memory error when the data file is very big

   -  **Note**: works only in case of loading data directly from file

-  ``header`` , default = ``false``, type = bool, aliases: ``has_header``

   -  set this to ``true`` if input data has header

   -  **Note**: works only in case of loading data directly from file

-  ``label_column`` , default = ``""``, type = int or string, aliases: ``label``

   -  used to specify the label column

   -  use number for index, e.g. ``label=0`` means column\_0 is the label

   -  add a prefix ``name:`` for column name, e.g. ``label=name:is_click``

   -  **Note**: works only in case of loading data directly from file

-  ``weight_column`` , default = ``""``, type = int or string, aliases: ``weight``

   -  used to specify the weight column

   -  use number for index, e.g. ``weight=0`` means column\_0 is the weight

   -  add a prefix ``name:`` for column name, e.g. ``weight=name:weight``

   -  **Note**: works only in case of loading data directly from file

   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0``

-  ``group_column`` , default = ``""``, type = int or string, aliases: ``group``, ``group_id``, ``query_column``, ``query``, ``query_id``

   -  used to specify the query/group id column

   -  use number for index, e.g. ``query=0`` means column\_0 is the query id

   -  add a prefix ``name:`` for column name, e.g. ``query=name:query_id``

   -  **Note**: works only in case of loading data directly from file

   -  **Note**: data should be grouped by query\_id

   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0 and query\_id is column\_1, the correct parameter is ``query=0``

-  ``ignore_column`` , default = ``""``, type = multi-int or string, aliases: ``ignore_feature``, ``blacklist``

   -  used to specify some ignoring columns in training

   -  use number for index, e.g. ``ignore_column=0,1,2`` means column\_0, column\_1 and column\_2 will be ignored

   -  add a prefix ``name:`` for column name, e.g. ``ignore_column=name:c1,c2,c3`` means c1, c2 and c3 will be ignored

   -  **Note**: works only in case of loading data directly from file

   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``

   -  **Note**: despite the fact that specified columns will be completely ignored during the training, they still should have a valid format allowing LightGBM to load file successfully

-  ``categorical_feature`` , default = ``""``, type = multi-int or string, aliases: ``cat_feature``, ``categorical_column``, ``cat_column``

   -  used to specify categorical features

   -  use number for index, e.g. ``categorical_feature=0,1,2`` means column\_0, column\_1 and column\_2 are categorical features

   -  add a prefix ``name:`` for column name, e.g. ``categorical_feature=name:c1,c2,c3`` means c1, c2 and c3 are categorical features

   -  **Note**: only supports categorical with ``int`` type (not applicable for data represented as pandas DataFrame in Python-package)

   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``

   -  **Note**: all values should be less than ``Int32.MaxValue`` (2147483647)

   -  **Note**: using large values could be memory consuming. Tree decision rule works best when categorical features are presented by consecutive integers starting from zero

   -  **Note**: all negative values will be treated as **missing values**

   -  **Note**: the output cannot be monotonically constrained with respect to a categorical feature

-  ``forcedbins_filename`` , default = ``""``, type = string

   -  path to a ``.json`` file that specifies bin upper bounds for some or all features

   -  ``.json`` file should contain an array of objects, each containing the word ``feature`` (integer feature index) and ``bin_upper_bound`` (array of thresholds for binning)

   -  see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/regression/forced_bins.json>`__ as an example

-  ``save_binary`` , default = ``false``, type = bool, aliases: ``is_save_binary``, ``is_save_binary_file``

   -  if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time

   -  **Note**: ``init_score`` is not saved in binary file

   -  **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function

Predict Parameters

Convert Parameters


-  ``convert_model_language`` , default = ``""``, type = string

   -  used only in ``convert_model`` task

   -  only ``cpp`` is supported yet; for conversion model to other languages consider using `m2cgen <https://github.com/BayesWitnesses/m2cgen>`__ utility

   -  if ``convert_model_language`` is set and ``task=train``, the model will be also converted

   -  **Note**: can be used only in CLI version

-  ``convert_model`` , default = ``gbdt_prediction.cpp``, type = string, aliases: ``convert_model_file``

   -  used only in ``convert_model`` task

   -  output filename of converted model

   -  **Note**: can be used only in CLI version

Objective Parameters
--------------------

-  ``objective_seed`` , default = ``5``, type = int

   -  used only in ``rank_xendcg`` objective

   -  random seed for objectives, if random process is needed

-  ``num_class`` , default = ``1``, type = int, aliases: ``num_classes``, constraints: ``num_class > 0``

   -  used only in ``multi-class`` classification application

-  ``is_unbalance`` , default = ``false``, type = bool, aliases: ``unbalance``, ``unbalanced_sets``

   -  used only in ``binary`` and ``multiclassova`` applications

   -  set this to ``true`` if training data are unbalanced

   -  **Note**: while enabling this should increase the overall performance metric of your model, it will also result in poor estimates of the individual class probabilities

   -  **Note**: this parameter cannot be used at the same time with ``scale_pos_weight``, choose only **one** of them

-  ``scale_pos_weight`` , default = ``1.0``, type = double, constraints: ``scale_pos_weight > 0.0``

   -  used only in ``binary`` and ``multiclassova`` applications

   -  weight of labels with positive class

   -  **Note**: while enabling this should increase the overall performance metric of your model, it will also result in poor estimates of the individual class probabilities

   -  **Note**: this parameter cannot be used at the same time with ``is_unbalance``, choose only **one** of them

-  ``sigmoid`` , default = ``1.0``, type = double, constraints: ``sigmoid > 0.0``

   -  used only in ``binary`` and ``multiclassova`` classification and in ``lambdarank`` applications

   -  parameter for the sigmoid function

-  ``boost_from_average`` , default = ``true``, type = bool

   -  used only in ``regression``, ``binary``, ``multiclassova`` and ``cross-entropy`` applications

   -  adjusts initial score to the mean of labels for faster convergence

-  ``reg_sqrt`` , default = ``false``, type = bool

   -  used only in ``regression`` application

   -  used to fit ``sqrt(label)`` instead of original values and prediction result will be also automatically converted to ``prediction^2``

   -  might be useful in case of large-range labels

-  ``alpha`` , default = ``0.9``, type = double, constraints: ``alpha > 0.0``

   -  used only in ``huber`` and ``quantile`` ``regression`` applications

   -  parameter for `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__ and `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__

-  ``fair_c`` , default = ``1.0``, type = double, constraints: ``fair_c > 0.0``

   -  used only in ``fair`` ``regression`` application

   -  parameter for `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__

-  ``poisson_max_delta_step`` , default = ``0.7``, type = double, constraints: ``poisson_max_delta_step > 0.0``

   -  used only in ``poisson`` ``regression`` application

   -  parameter for `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__ to safeguard optimization

-  ``tweedie_variance_power`` , default = ``1.5``, type = double, constraints: ``1.0 <= tweedie_variance_power < 2.0``

   -  used only in ``tweedie`` ``regression`` application

   -  used to control the variance of the tweedie distribution

   -  set this closer to ``2`` to shift towards a **Gamma** distribution

   -  set this closer to ``1`` to shift towards a **Poisson** distribution

-  ``lambdarank_truncation_level`` , default = ``20``, type = int, constraints: ``lambdarank_truncation_level > 0``

   -  used only in ``lambdarank`` application

   -  used for truncating the max DCG, refer to "truncation level" in the Sec. 3 of `LambdaMART paper <https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf>`__

-  ``lambdarank_norm`` , default = ``true``, type = bool

   -  used only in ``lambdarank`` application

   -  set this to ``true`` to normalize the lambdas for different queries, and improve the performance for unbalanced data

   -  set this to ``false`` to enforce the original lambdarank algorithm

-  ``label_gain`` , default = ``0,1,3,7,15,31,63,...,2^30-1``, type = multi-double

   -  used only in ``lambdarank`` application

   -  relevant gain for labels. For example, the gain of label ``2`` is ``3`` in case of default label gains

   -  separate by ``,``

Metric Parameters
-----------------

-  ``metric`` , default = ``""``, type = multi-enum, aliases: ``metrics``, ``metric_types``

   -  metric(s) to be evaluated on the evaluation set(s)

      -  ``""`` (empty string or not specified) means that metric corresponding to specified ``objective`` will be used (this is possible only for pre-defined objective functions, otherwise no evaluation metric will be added)

      -  ``"None"`` (string, **not** a ``None`` value) means that no metric will be registered, aliases: ``na``, ``null``, ``custom``

      -  ``l1``, absolute loss, aliases: ``mean_absolute_error``, ``mae``, ``regression_l1``

      -  ``l2``, square loss, aliases: ``mean_squared_error``, ``mse``, ``regression_l2``, ``regression``

      -  ``rmse``, root square loss, aliases: ``root_mean_squared_error``, ``l2_root``

      -  ``quantile``, `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__

      -  ``mape``, `MAPE loss <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`__, aliases: ``mean_absolute_percentage_error``

      -  ``huber``, `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__

      -  ``fair``, `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__

      -  ``poisson``, negative log-likelihood for `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__

      -  ``gamma``, negative log-likelihood for **Gamma** regression

      -  ``gamma_deviance``, residual deviance for **Gamma** regression

      -  ``tweedie``, negative log-likelihood for **Tweedie** regression

      -  ``ndcg``, `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__, aliases: ``lambdarank``, ``rank_xendcg``, ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``

      -  ``map``, `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__, aliases: ``mean_average_precision``

      -  ``auc``, `AUC <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`__

      -  ``binary_logloss``, `log loss <https://en.wikipedia.org/wiki/Cross_entropy>`__, aliases: ``binary``

      -  ``binary_error``, for one sample: ``0`` for correct classification, ``1`` for error classification

      -  ``auc_mu``, `AUC-mu <http://proceedings.mlr.press/v97/kleiman19a/kleiman19a.pdf>`__

      -  ``multi_logloss``, log loss for multi-class classification, aliases: ``multiclass``, ``softmax``, ``multiclassova``, ``multiclass_ova``, ``ova``, ``ovr``

      -  ``multi_error``, error rate for multi-class classification

      -  ``cross_entropy``, cross-entropy (with optional linear weights), aliases: ``xentropy``

      -  ``cross_entropy_lambda``, "intensity-weighted" cross-entropy, aliases: ``xentlambda``

      -  ``kullback_leibler``, `Kullback-Leibler divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence>`__, aliases: ``kldiv``

   -  support multiple metrics, separated by ``,``

-  ``metric_freq`` , default = ``1``, type = int, aliases: ``output_freq``, constraints: ``metric_freq > 0``

   -  frequency for metric output

   -  **Note**: can be used only in CLI version

-  ``is_provide_training_metric`` , default = ``false``, type = bool, aliases: ``training_metric``, ``is_training_metric``, ``train_metric``

   -  set this to ``true`` to output metric result over training dataset

   -  **Note**: can be used only in CLI version

-  ``eval_at`` , default = ``1,2,3,4,5``, type = multi-int, aliases: ``ndcg_eval_at``, ``ndcg_at``, ``map_eval_at``, ``map_at``

   -  used only with ``ndcg`` and ``map`` metrics

   -  `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__ and `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__ evaluation positions, separated by ``,``

-  ``multi_error_top_k`` , default = ``1``, type = int, constraints: ``multi_error_top_k > 0``

   -  used only with ``multi_error`` metric

   -  threshold for top-k multi-error metric

   -  the error on each sample is ``0`` if the true class is among the top ``multi_error_top_k`` predictions, and ``1`` otherwise

      -  more precisely, the error on a sample is ``0`` if there are at least ``num_classes - multi_error_top_k`` predictions strictly less than the prediction on the true class

   -  when ``multi_error_top_k=1`` this is equivalent to the usual multi-error metric

-  ``auc_mu_weights`` , default = ``None``, type = multi-double

   -  used only with ``auc_mu`` metric

   -  list representing flattened matrix (in row-major order) giving loss weights for classification errors

   -  list should have ``n * n`` elements, where ``n`` is the number of classes

   -  the matrix co-ordinate ``[i, j]`` should correspond to the ``i * n + j``-th element of the list

   -  if not specified, will use equal weights for all classes

Network Parameters
------------------

-  ``num_machines`` , default = ``1``, type = int, aliases: ``num_machine``, constraints: ``num_machines > 0``

   -  the number of machines for parallel learning application

   -  this parameter is needed to be set in both **socket** and **mpi** versions

-  ``local_listen_port`` , default = ``12400``, type = int, aliases: ``local_port``, ``port``, constraints: ``local_listen_port > 0``

   -  TCP listen port for local machines

   -  **Note**: don't forget to allow this port in firewall settings before training

-  ``time_out`` , default = ``120``, type = int, constraints: ``time_out > 0``

   -  socket time-out in minutes

-  ``machine_list_filename`` , default = ``""``, type = string, aliases: ``machine_list_file``, ``machine_list``, ``mlist``

   -  path of file that lists machines for this parallel learning application

   -  each line contains one IP and one port for one machine. The format is ``ip port`` (space as a separator)

-  ``machines`` , default = ``""``, type = string, aliases: ``workers``, ``nodes``

   -  list of machines in the following format: ``ip1:port1,ip2:port2``

GPU Parameters
--------------

-  ``gpu_platform_id`` , default = ``-1``, type = int

   -  OpenCL platform ID. Usually each GPU vendor exposes one OpenCL platform

   -  ``-1`` means the system-wide default platform

   -  **Note**: refer to `GPU Targets <./GPU-Targets.rst#query-opencl-devices-in-your-system>`__ for more details

-  ``gpu_device_id`` , default = ``-1``, type = int

   -  OpenCL device ID in the specified platform. Each GPU in the selected platform has a unique device ID

   -  ``-1`` means the default device in the selected platform

   -  **Note**: refer to `GPU Targets <./GPU-Targets.rst#query-opencl-devices-in-your-system>`__ for more details

-  ``gpu_use_dp`` , default = ``false``, type = bool

   -  set this to ``true`` to use double precision math on GPU (by default single precision is used)

.. end params list

Others
------

Continued Training with Input Score

LightGBM supports continued training with initial scores. It uses an additional file to store these initial scores, like the following:

::

0.5
-0.1
0.9
...

It means the initial score of the first data row is 0.5, second is -0.1, and so on. The initial score file corresponds with data file line by line, and has per score per line.

And if the name of data file is train.txt, the initial score file should be named as train.txt.init and placed in the same folder as the data file. In this case, LightGBM will auto load initial score file if it exists.

Weight Data


LightGBM supports weighted training. It uses an additional file to store weight data, like the following:

::

    1.0
    0.5
    0.8
    ...

It means the weight of the first data row is ``1.0``, second is ``0.5``, and so on.
The weight file corresponds with data file line by line, and has per weight per line.

And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file.
In this case, LightGBM will load the weight file automatically if it exists.

Also, you can include weight column in your data file. Please refer to the ``weight_column`` `parameter <#weight_column>`__ in above.

Query Data
~~~~~~~~~~

For learning to rank, it needs query information for training data.
LightGBM uses an additional file to store query data, like the following:

::

    27
    18
    67
    ...

It means first ``27`` lines samples belong to one query and next ``18`` lines belong to another, and so on.

**Note**: data should be ordered by the query.

If the name of data file is ``train.txt``, the query file should be named as ``train.txt.query`` and placed in the same folder as the data file.
In this case, LightGBM will load the query file automatically if it exists.

Also, you can include query/group id column in your data file. Please refer to the ``group_column`` `parameter <#group_column>`__ in above.

.. _Laurae++ Interactive Documentation: https://sites.google.com/view/lauraepp/parameters


Recommended Posts

LightGBM All parameter explanation (on the way)
Introduction to Python with Atom (on the way)
I made a VGG16 model using TensorFlow (on the way)