[PYTHON] Exam Mathematics Part 1 (Question setting and data generation)

I will follow the theoretically well-known item reaction theory as a mathematical test and try to scratch the parameter estimation to some extent.

Since it is long to make one article, I will write it over multiple articles. The table of contents is as follows.

Mathematical exam part 1 (question setting and data generation)
Test Mathematical Part 2 (Mathematical Model of Item Reaction Theory)
Test Mathematics Part 3 (3PL model optimization)
Test Mathematical Part 4 (Implementation of Problem paramter Estimate)
Examination Mathematical Science 5 (Examinee paramter estimation)

This time is Part 1 "Problem setting and data generation".

Problem setting

For example, consider a test such as the TOEIC or center test. Suppose multiple test takers solve multiple questions and get the following results:

	Candidate 1	Candidate 2	Candidate 3	Candidate 4	Candidate 5	Candidate 6	Candidate 7	Number of correct answers
Question 1	Positive	Positive	Positive	Positive	Positive	Wrong	Positive	6
Question 2	Wrong	Wrong	Wrong	Positive	Wrong	Wrong	Wrong	1
Question 3	Positive	Positive	Positive	Positive	Positive	Positive	Wrong	6
Question 4	Wrong	Wrong	Wrong	Positive	Wrong	Positive	Positive	3
Question 5	Wrong	Wrong	Wrong	Positive	Wrong	Wrong	Positive	2
Question 6	Positive	Positive	Positive	Positive	Wrong	Positive	Positive	6
Question 7	Positive	Wrong	Positive	Positive	Positive	Positive	Positive	6
Question 8	Wrong	Positive	Wrong	Wrong	Positive	Wrong	Positive	3
Question 9	Positive	Positive	Wrong	Positive	Positive	Positive	Positive	6
Question 10	Positive	Positive	Positive	Wrong	Wrong	Positive	Positive	5
Raw score	6	6	5	8	5	6	8

The view of the table is that the intersection of the test taker and the question is the correctness of the test taker in the question. For example, "Examinee 1 answered Q3 correctly" and "Examinee 6 answered Q5 incorrectly". ** Number of correct answers ** is the number of correct answers for all test takers of the question, and is the number of positive answers per line. ** Raw score ** is the total score when the correct answers are equal to 1 point, and is the positive number for each column.

Now, in this situation, let's say you want to ** estimate the test taker's ability and the difficulty of the question **. Candidates' abilities can be used to rank and pass or fail in exams, etc., and question difficulty can be used to adjust the difficulty of the entire exam when you want to reuse the questions. That's right.

What to think

Should the test taker's ability use the raw score as it is? If you do so, the scores for difficult and easy questions will be the same. I can't handle the situation where I happen to make a mistake in an easy problem even though I can solve a difficult problem.

Then, how about trying to score the reciprocal of the number of correct answers? It sounds good, but how legitimate is this difference in scores? For example, in the situation in the table above, the score of Q2 is 6 times the score of Q1, is that good?

Item reaction theory

The theory that has been developed for the purpose of estimating the ability of the examinee and the difficulty of the question is ** Item Response Theory ** (IRT). IRT is a mathematical model that is very useful for creating, conducting, and evaluating exams, and it seems that it is actually used for TOEFL (English exam for study abroad) and IT passport exam [^ 1].

data

In the following articles, I will describe the item reaction theory so that I can implement it to some extent, including the theoretical background, but first I will introduce the actual data and the place to generate the data. Wouldn't it be nice to try out how much this theory can be used with actual and generated data?

Real data

The actual data seems to be in KDDCUP [^ 2], for example. This data is not simply the data that is correct or incorrect with the examinee, so it needs to be processed. It seems that this data is introduced in the article Ability score estimation using pyirt.

Generated data

It uses a little knowledge of IRT, but you can get the data by doing the following, for example. The environment is

python 3.8
numpy 1.19.2

is.

import numpy as np
from functools import partial
#3 Definition of parameter logistic model
def L3P(a, b, c, x):
    return c + (1 - c) / (1 + np.exp(-  a * (x - b)))

#Definition of model parameter
#a is a positive real number,b is a real number,c should be greater than 0 and less than 1

a_min = 0.3
a_max = 1

b_min = -2
b_max = 2

c_min = 0
c_max = .4

#How many questions, how many people, 10 questions 7 people below
num_items = 10
num_users = 7

#Generate problem parameter
item_params = np.array(
    [np.random.uniform(a_min, a_max, num_items),
     np.random.uniform(b_min, b_max, num_items),
     np.random.uniform(c_min, c_max, num_items)]
).T

#Candidate parameter generation
user_params = np.random.normal(size=num_users)

#Item reaction matrix creation, element 1(Correct answer)Or 0(Wrong answer)
#In row i and column j, how did examinee j react to question i?
ir_matrix_ij = np.vectorize(int)(
    np.array(
        [partial(L3P, *ip)(user_params) > np.random.uniform(0, 1, num_users) for ip in item_params]
    )
)

If you generate it with this, you should get a matrix of 1, 0 as shown in the table above. The $ i $ row and $ j $ column show how the test taker $ j $ responded to Q $ i $. 0 is wrong answer 1 is correct answer. As I will write in the following articles, I will use the subscript $ i $ to represent the question and the subscript $ j $ to represent the candidate.

The raw score is

raw_score_j = ir_matrix_ij.sum(axis=0)

The number of correct answers is

num_correct_i = ir_matrix_ij.sum(axis=1)

You can get it at.

next time

Introducing 1, 2, 3 parameter logistic models that are often used in item reaction theory. Test Mathematical Part 2 (Mathematical Model of Item Reaction Theory)