Introduction

It was necessary to test the difference between the average values of the count data according to the Poisson distribution in my work, but I did not have the information in Japanese and implemented it with reference to the following paper, so I will leave it as a memorandum.

[A more powerful test for comparing two Poisson means] (http://www.sciencedirect.com/science/article/pii/S0378375802004081)

It was said that there are two methods of testing, the conditional test (C-test) and the test using the P value proposed in this paper (E-test), but this time the detection power Implemented the higher E-test.

easy explanation

$ n1, n2 $: Number of elapsed unit times $ k1, k2 $: Total number of occurrences of events for the entire period $ d $: Difference between the means you want to test

When

Null hypothesis $ H_0: \ lambda_1 = \ lambda_2 + d $ Alternative hypothesis $ H_1: \ lambda_1 \ neq \ lambda_2 + d $

The P value when performing the hypothesis test of is calculated by the following formula.


\begin{align}
& \hat{\lambda}_{2k} = \frac{k_1 + k_2}{n_1 + n_2} - \frac{d n_1}{n_1 + n_2}\\[2mm]

& p = \sum_{x_1=0}^{k_1} \sum_{x_2=0}^{k_2}
\frac{e^{-n_1(\hat{\lambda}_{2k}+d)}(n_1(\hat{\lambda}_{2k} + d))^{x_1}}{x_1!}
\frac{e^{-n_2\hat{\lambda}_{2k}}(n_2(\hat{\lambda}_{2k}))^{x_2}}{x_2!}
\end{align}

Implementation

How to use

The following is an implementation of the above expression in Python and Ruby. For example, if the counts you want to compare are 40 and 65, respectively, and you want to know if there is a significant difference between the two mean values,

pois_mean_diff_test(40, 65)
=> 0.04921332025427114

Then you can calculate the P value. This time, we can see that the P value is below 0.05, so we can see that there is a significant difference at the 5% level.

important point

If the formula is implemented as it is, it will overflow in the floating point calculation, so the factorial calculation was decomposed and implemented as follows.

\frac{e^{-n_1(\hat{\lambda}_{2k}+d)}(n_1(\hat{\lambda}_{2k} + d))^{x_1}}{x_1!}
= e^{-n_1(\hat{\lambda}_{2k}+d)} \times
\frac{n_1(\hat{\lambda}_{2k} + d)}{x_1} \times \frac{n_1(\hat{\lambda}_{2k} + d)}{x_1 -1} \times \dots \times \frac{n_1(\hat{\lambda}_{2k} + d)}{1}

Implementation in Python

`Python`


import math
import numpy as np

def pois_mean_diff_test(k1, k2, n1=1, n2=1, d=0.0):

    x1_seq = range(0, k1 + 1)
    x2_seq = range(0, k2 + 1)
    l2k = (k1+k2)/(n1+n2) - d*n1/(n1+n2)

    p_value = sum([math.exp(-n1*(l2k+d)) * np.prod([n1*(l2k+d)/i for i in range(1, x1+1)]) * \
                   math.exp(-n2*l2k) * np.prod([n2*l2k/j for j in range(1, x2+1)]) \
                   for x2 in x2_seq for x1 in x1_seq])

    return p_value

if __name__ == '__main__':
    print "P-value is " + str(pois_mean_diff_test(40, 65))

Implementation in Ruby

`Ruby`


def pois_mean_diff_test(k1, k2, n1=1, n2=1, d=0.0, alpha=0.05)

  x1_seq = Array(0..k1)
  x2_seq = Array(0..k2)

  l2k = (k1+k2)/(n1+n2) - d*n1/(n1+n2)

  p_value = x1_seq.product(x2_seq).map{|x| x1=x[0]; x2=x[1];
    Math.exp(-n1*(l2k+d)) * Array(1..x1).map{|i| n1*(l2k+d)/i }.inject(:*).to_f *
    Math.exp(-n1*l2k) * Array(1..x2).map{|j| n2*l2k/j }.inject(:*).to_f
  }.inject(:+)

  return p_value
end

out = "P-value is " + pois_mean_diff_test(20,10).to_s
puts out

References

[K. Krishnamoorthy, Jessica Thomson, A more powerful test for comparing two Poisson means, Journal of Statistical Planning and Inference, Volume 119, Issue 1, 15 January 2004, Pages 23-35] (http://www.sciencedirect.com/science/article/pii/S0378375802004081)

[PYTHON] Test of the difference between the mean values of count data according to the Poisson distribution