[PYTHON] Stat estimation using pyirt

This article is the 20th day article of Classi Advent Calendar 2019.

Hello. My name is @yosuke_ohara and I belong to the Data AI Department and am a data scientist. Classi is a company that provides cloud services that support the use of ICT in school education, but in the field of data science in the field of education, the ability value of learners is often of interest (appropriate if the learner's condition is not understood). It's natural because it doesn't lead to learning). Therefore, I would like to introduce a method called ** Item Response Theory **, which is often used in ability value estimation.

table of contents

  1. What is Item Response Theory? (# Anchor1) 1-1. [What is the degree of discrimination](# anchor2) 1-2. [What is the difficulty level](# anchor3)
  2. [Parameter estimation method](# anchor4)
  3. [Parameter estimation by peripheral maximum likelihood method](# anchor5)
  4. [IRT Challenge](# anchor6)
  5. [Finally](# anchor7)

What is Item Response Theory?

Item Response Theory (IRT) is a method mainly used to estimate ability values from test results, and is famous as a test using IRT by TOEIC. The biggest advantage of IRT is that it allows you to estimate stats across tests. Most of the tests, such as the center test, have limited opportunities to take the test, but by using IRT, test takers who have taken different tests can also make a common evaluation. .. Therefore, it is a good point for test takers to be able to take the test at the right time. However, a process called equalization is required to make estimates across tests. Equalization refers to the process of aligning the results of different tests to a common origin and unit. In order to perform equalization, it is necessary to include a common problem (called an "anchor problem") in the comparison test, and the design including how to perform equalization is called a test design. I will.

In addition, it is characterized by defining and estimating parameters not only for ability values but also for items (problems in tests). Specifically, a logistic model is assumed between the ability value and the correct answer probability, and the logistic shape changes depending on the parameter of the item to express the difference in the correct answer probability for each question. There are logistic models from 1-parameter model to 4-parameter model, but this time we will take the most commonly used 2-parameter logistic model as an example.

p(θ) = \ \frac{1}{1+e^{-Da(θ-β)}}…(1)\\    

θ is the ability value and p (θ) is the correct answer probability. D is a constant for integrating to 1, and the most important item parameters are α (discrimination) and β (difficulty). Let's take a concrete look at how the correct answer probability changes depending on α and β.

1-1. What is the degree of discrimination?

スクリーンショット 2019-12-20 18.38.21.png

The degree of discrimination is a parameter that indicates "how much the examinee's ability can be identified". In other words, it is a value that represents "how much ability can be judged by the correctness of the problem". Looking at the graph above, you can see that the probability of correct answers rises sharply near the center of the curve (green) with a large α. In other words, the range of ability values where the probability of correct answer is around 50% is narrow, and it is easy to judge that "if you answer this question correctly, the ability value is high". On the other hand, the curve with a small α (blue) is a gentle curve, and the range of ability values with a correct answer probability of around 50% is wide. In other words, it cannot be said that "even if you answer this question correctly, your ability score is high".

1-2. What is the difficulty level?

スクリーンショット 2019-12-20 18.37.44.png Difficulty, as the word implies, represents the difficulty of an item. It can be seen that the larger β is, the more the curve is translated in the positive direction of the x-axis. In other words, when comparing the correct answer probabilities with the same ability for each curve, it can be seen that the larger the difficulty parameter, the lower the correct answer probability.

2. Parameter estimation method

I tried to organize the major estimation methods (because there is no sharpness if you write in detail such as Bayesian extension, it is a large particle size).

Estimating method           Overview
Maximum likelihood estimation method A method used to estimate one of the ability values and item parameters that is known
Simultaneous maximum likelihood estimation method A method of estimating ability values and item parameters at the same time. Estimate by repeating "(1) Partial differentiation by ability value and solving" and "(2) Partial differentiation by item parameter and solving" for the likelihood function.
Peripheral maximum likelihood estimation method A method of estimating by integrating and eliminating with the ability value parameter when estimating the item parameter in simultaneous estimation. Calculate using the EM algorithm

Maximum likelihood estimation is a simple method of defining the correct answer probability with equation (1) and creating a likelihood function, but it is premised that one of the parameters is required. Simultaneous estimation is a method that allows each parameter to be obtained at once, but there is a problem that the estimator of the item parameter does not satisfy the consistency. Therefore, there is a drawback that the accuracy does not increase even if the number of samples is increased in order to improve the estimation accuracy. In order to solve this problem, the method of marginalizing the ability value parameter when estimating the item parameter is the peripheral estimation method, and it seems that this method is currently the preferred estimation. However, since the likelihood function when marginalized cannot be solved explicitly, it is estimated by numerical calculation using the EM algorithm.

Parameter estimation by peripheral maximum likelihood method

When I investigated the package that can analyze using IRT, it was pyirt for Python and ltm for R. .org/web/packages/ltm/ltm.pdf) and lazy.irtx seem to be usable. This time, I would like to use the pyirt package to estimate the parameters by the peripheral maximum likelihood method. The dataset uses algebra_2005_2006 used by KDDCUP. DataShop @ CMU has various public_data related to education and is recommended. Also, since the necessary data is uploaded to BigQuery, it is read using read_gbq of pandas.

%reload_ext autoreload
%autoreload 2
import itertools
import numpy as np
import pandas  as pd
from pandas.io import gbq
from pyirt import irt
#Module that manages queries that extract data from BigQuery
import queries
from tqdm import tqdm

_train = pd.read_gbq(queries.train_agg(), PROJECT_ID, dialect='standard', location="asia-northeast1")
#In the case of pyirt, unify to the following column names[(user_id, item_id, ans_boolean)]Structure(List of tuples)Need to be
train = _train.rename(columns={
    "anon_student_id": "user_id",
    "question_unique_key": "item_id",
    "is_correct": "ans_boolean"
})

item_param, user_param = {}, {}
problem_hierarchies = train["problem_hierarchy"].unique()
#problem_Divide the data for each hierarchy and estimate by the peripheral maximum likelihood method
for _problem_hierarchy in tqdm(problem_hierarchies, position=0):
    train_by_problem_hierarchy = train.query("problem_hierarchy == @_problem_hierarchy").drop("problem_hierarchy", axis=1)
    train_by_problem_hierarchy = train_by_problem_hierarchy[["user_id", "item_id", "ans_boolean"]].values
    #irt is a function for estimation
    _item_param, _user_param = irt(train_by_problem_hierarchy)
    item_param.update(_item_param)
    user_param.update(_user_param)

The ability value (user_param) and item value (item_param) are returned in dictionary format like this. スクリーンショット 2019-12-20 19.16.46.png スクリーンショット 2019-12-20 19.16.34.png

I want to store the results in BigQuery, so I convert it to a data frame and send it with gbq.to_gbq.

user_param_df = pd.DataFrame(user_param.items(), columns=["anon_student_id", "theta"])
item_param_dict = []
for tmp_question_unique_key, param_dict in item_param.items():
    item_param_dict.append({
        "question_unique_key": tmp_question_unique_key,
        "alpha": param_dict["alpha"],
        "beta": param_dict["beta"],
        "c": param_dict["c"]
    })
item_param_df = pd.io.json.json_normalize(item_param_dict)

#Send estimation results to BigQuery
gbq.to_gbq(item_param_df,'{}.item_param'.format(DATASET), project_id=PROJECT_ID, if_exists='append', location="asia-northeast1")
gbq.to_gbq(user_param_df,'{}.user_param'.format(DATASET), project_id=PROJECT_ID, if_exists='append', location="asia-northeast1")
スクリーンショット 2019-12-20 18.03.16.png

If you make a scatter plot with the degree of identification x difficulty for each question, you can see that the degree of identification is estimated in the range of 0 to 3 and the difficulty is estimated in the range of -3 to 3.

IRT challenges

IRT is suitable for estimation in situations where the ability value does not change, such as in a test, but estimation in time-series situations (for example, when the ability value changes sequentially due to learning such as self-study) There is a problem that it is not suitable. Therefore, time-series expansions have been made, such as the introduction of time parameters to the degree of discrimination and the method of describing changes in ability values in a state-space model. In addition, a method called Knowledge Tracing is often used as a method for estimating ability values. Knowledge Tracing is a model that assumes a hidden Markov model in the transition of ability values, and is characterized by defining the probabilities between transitions as conditional probabilities and estimating ability values from conditional probabilities. I couldn't find a package for Knowledge Tracing on pypi, but a package called pyBKT (BKT… Bayesian Knowledge Tracing) is on Git, so give it a try. I will try it.

Finally

This time, I mainly talked about IRT, but I think that the field of ability value estimation has a short history and there is room for the method of other fields to roll over. I feel that it is necessary to change the method of formulating the problem and take a flexible approach without being bound by the existing method. If you would like to know more about ability score estimation, please refer to the introductory part of the paper here that won the Award at EDM (Educational Data Mining) in 2019. Please have a look at this.

Tomorrow is @kitaharamikiya !!

Recommended Posts

Stat estimation using pyirt
Trajectory estimation simulation using Graph-Based SLAM
Extract information using File :: Stat in Ruby
Category estimation using docomo's image recognition API