[PYTHON] Exam Mathematical Part 4 (Implementation of Problem paramter Estimate)

This is a continuation of Test Mathematics Part 3 (3PL model optimization).

Last time, it was "Mathematical optimization of 3PL model". This time, it is "Implementation of problem paramter estimation of 3PL model".

The environment used is

python 3.8
numpy 1.19.2
scipy 1.5.2

is.

Data generation

Generate various parameters and item reaction matrices for the experiment as done in Test Mathematics Part 1 (Question Setting and Data Generation).

import numpy as np
from functools import partial

#Keep a small ε so that it does not diverge in numerical calculations
epsilon =  0.0001
#3 Definition of parameter logistic model
def L3P(a, b, c, x):
    return c + (1 - epsilon - c) / (1 + np.exp(-  a * (x - b)))

#2 Definition of parameter logistic model. In order to unify the processing, c is also taken as an argument.
def L2P(a, b, c, x):
    return (1 - epsilon) / (1 + np.exp(-  a * (x - b)))

#Definition of model parameter
#a is a positive real number,b is a real number,c should be greater than 0 and less than 1

a_min = 0.7
a_max = 4

b_min = -2
b_max = 2

c_min = 0
c_max = .4

#How many questions, how many people, 10 questions 4000 people below
num_items = 10
num_users = 4_000

#Generate problem parameter
item_params = np.array(
    [np.random.uniform(a_min, a_max, num_items),
     np.random.uniform(b_min, b_max, num_items),
     np.random.uniform(c_min, c_max, num_items)]
).T

#Candidate parameter generation
user_params = np.random.normal(size=num_users)

#Item reaction matrix creation, element 1(Correct answer)Or 0(Wrong answer)
#In row i and column j, how did examinee j react to question i?
ir_matrix_ij = np.vectorize(int)(
    np.array(
        [partial(L3P, *ip)(user_params) > np.random.uniform(0, 1, num_users) for ip in item_params]
    )
)

As before, we will use $ i $ as the subscript for the question and $ j $ as the subscript for the test taker. We will take a large number of examinees to 1000 or more. This is because it is empirically known that the estimation is stable to some extent when there are 100 or more people for 1PL estimation, 300 or more for 2PL estimation, and 1000 or more for 3PL estimation. If you are interested, please change here and experiment.

Here, as the problem parameter

	a /Discrimination	b /Difficulty	c /Guess
Question 1	3.34998814	0.96567682	0.33289520
Question 2	1.78741502	1.09887666	0.22340858
Question 3	1.33657604	-0.97455532	0.21594273
Question 4	1.05624284	0.84572140	0.11501424
Question 5	1.21345944	1.24370213	0.32661421
Question 6	3.22726757	-0.95479962	0.33023057
Question 7	1.73090248	1.46742090	0.21991115
Question 8	2.16403443	1.66529355	0.10403063
Question 9	2.35283349	1.78746377	0.22301869
Question 10	1.77976105	-0.06035497	0.29241184

Got The purpose is to estimate this.

Data preprocessing

To stabilize the accuracy of the estimation before making the estimation

Questions with too high a correct answer rate
Questions with too low percentage of correct answers
Problems with low correlation with raw scores

Is removed from the estimation target. In particular

#Eliminate problems that are difficult to estimate.
#The correct answer rate is too high(> 0.9), Too low(0.1 <), There is too little correlation with the raw score(< 0.3)Remove the problem
filter_a = ir_matrix_ij.sum(axis=1) / num_users < 0.9
filter_b = ir_matrix_ij.sum(axis=1) / num_users > 0.1
filter_c = np.corrcoef(np.vstack((ir_matrix_ij, ir_matrix_ij.sum(axis=0))))[num_items][:-1] >= 0.3
filter_total = filter_a & filter_b & filter_c

#Redefine the item reaction matrix
irm_ij = ir_matrix_ij[filter_total]
num_items, num_users = irm_ij.shape

Preparation of functions and constants for estimation

Last time As you can see, every problem $ i $

\begin{align}
r_{i\theta} &:= 
\sum_{j}u_{ij} \Pr\{\theta| U_j = u_j, a^{old}, b^{old}, c^{old}\} \\
f_\theta &= 
\sum_{j} \Pr\{\theta| U_j = u_j, a^{old}, b^{old}, c^{old}\} 
\end{align}

Is the E step, which calculates

\begin{align}
R_{i\theta} :=\frac{r_{i\theta}}{P_{i\theta}} - f_\theta
\end{align}

Calculate,

\begin{align}
\partial^a\mathcal{\tilde{L}}(a, b, c) 
&= \int R_{i\theta}(\theta - b_i)P_{i\theta}^*d\theta\\
\partial^b\mathcal{\tilde{L}}(a, b, c) 
&= -a_i \int R_{i\theta}P_{i\theta}^*d\theta\\
\partial^c\mathcal{\tilde{L}}(a, b, c) 
&= \frac{1}{(1 - c_i)} \int R_{i\theta} d\theta
\end{align}

The M step is to set to 0. Here, $ X_k $ and $ g_k = g (X_k) $ are prepared as representative points of $ \ theta $, as the marginalization and the above integral are solved in a quadrature manner.

Therefore, prepare the following.

#Define the possible range of test taker parameters.
X_k = np.linspace(-4, 4, 41)
#Define the distribution of test taker parameters. Here, we use the normal distribution of scipy.
from scipy.stats import norm
g_k = norm.pdf(X_k)
#E step function
def get_exp_params(irm_ij, g_k, P3_ik):
    Lg_jk = np.exp(irm_ij.T.dot(np.log(P3_ik)) + (1 - irm_ij).T.dot(np.log(1 - P3_ik)))* g_k
    n_Lg_jk = Lg_jk / Lg_jk.sum(axis=1)[:, np.newaxis]
    f_k = n_Lg_jk.sum(axis=0)
    r_ik = irm_ij.dot(n_Lg_jk)
    return f_k, r_ik
#Score function for M step
def score_(param, f_k, r_k, X_k):
    a, b, c = param
    P3_k = partial(L3P, a, b, c)(X_k)
    P2_k = partial(L2P, a, b, c)(X_k)
    R_k = r_k / P3_k - f_k
    v = [
        ((X_k - b) * R_k * P2_k).sum(),
        - a * (R_k * P2_k).sum(),
        R_k.sum() / (1 - c)
    ]
    return np.linalg.norm(v)

Perform estimation

Now let's perform the estimation. Since the estimation depends on the initial parameter and the stability is expected to be not so good, prepare some initial parameters at random and adopt the median value among the stable ones as the estimation result. For optimization of M step, we will use minimize of scipy.

from scipy.optimize import minimize
#Define constraints for minimize.
cons_ = {
    'type': 'ineq',
    'fun': lambda x:[
        x[0] - a_min,
        a_max - x[0],
        x[1] - b_min,
        b_max - x[1],
        x[2] - c_min,
        c_max - x[2],
    ]
}
#Prepare a parameter for initial parameter generation.
a_min, a_max = 0.1, 8.0
b_min, b_max = -4.0, 4.0
c_min, c_max = epsilon, 0.6

#Oarameter for estimation execution
#EM algorithm repeat termination condition
delta = 0.001
#Maximum number of EM algorithm iterations
max_num_of_itr = 1000

#Calculate several times for numerical stability and adopt the median of the stable ones
p_data = []
for n_try in range(10):
    #Define the initial value of the estimate.
    item_params_ = np.array(
        [np.random.uniform(a_min, a_max, num_items),
         np.random.uniform(b_min, b_max, num_items),
         np.random.uniform(c_min, c_max, num_items)]
    ).T

    prev_item_params_ = item_params_
    for itr in range(max_num_of_itr):
        # E step :Calculation of exp param
        P3_ik = np.array([partial(L3P, *ip)(X_k) for ip in item_params_])
        f_k, r_ik = get_exp_params(irm_ij, g_k, P3_ik)
        ip_n = []
        #Solve optimization problems for each problem
        for item_id in range(num_items):
            target = partial(score_, f_k=f_k, r_k=r_ik[item_id], X_k=X_k)
            result = minimize(target, x0=item_params_[item_id], constraints=cons_, method="slsqp")         
            ip_n.append(list(result.x))

        item_params_ = np.array(ip_n)
        #Calculation ends when the average difference from the previous time falls below a certain value
        mean_diff = abs(prev_item_params_ - item_params_).sum() / item_params_.size
        if mean_diff < delta:
            break
        prev_item_params_ = item_params_

    p_data.append(item_params_)
    
p_data_ = np.array(p_data)
result_ = []
for idx in range(p_data_.shape[1]):
    t_ = np.array(p_data)[:, idx, :]
    #Eliminate extremes in calculation results
    filter_1 = t_[:, 1] < b_max - epsilon 
    filter_2 = t_[:, 1] > b_min + epsilon
    #The remaining median is used as the calculation result.
    result_.append(np.median(t_[filter_1 & filter_2], axis=0))
    
result = np.array(result_)

Experimental result

This time I got the following as a result of the calculation. ** Calculation result (purpose) **

	a /Discrimination	b /Difficulty	c /Guess
Question 1	3.49633348(3.34998814)	1.12766137(0.96567682)	0.35744497(0.33289520)
Question 2	2.06354365(1.78741502)	1.03621881(1.09887666)	0.20507606(0.22340858)
Question 3	1.64406087(1.33657604)	-0.39145998(-0.97455532)	0.48094315(0.21594273)
Question 4	1.47999466(1.05624284)	0.95923840(0.84572140)	0.18384673(0.11501424)
Question 5	1.44474336(1.21345944)	1.12406269(1.24370213)	0.31475672(0.32661421)
Question 6	3.91285332(3.22726757)	-1.09218709(-0.95479962)	0.18379076(0.33023057)
Question 7	1.44498535(1.73090248)	1.50705016(1.46742090)	0.20601461(0.21991115)
Question 8	2.37497907(2.16403443)	1.61937999(1.66529355)	0.10503096(0.10403063)
Question 9	3.10840278(2.35283349)	1.69962392(1.78746377)	0.22051818(0.22301869)
Question 10	1.79969976(1.77976105)	0.06053145(-0.06035497)	0.29944448(0.29241184)

There are some strange values such as b (difficulty) in Q3, but it can be estimated with roughly good accuracy.

next time

Next time, we will estimate the test taker's ability parameter $ \ theta_j $ using the result of the parameter in question. Test Mathematical Science 5 (Examinee paramter estimation)