[PYTHON] Test of the difference between the mean values of count data according to the Poisson distribution

Introduction

It was necessary to test the difference between the average values of the count data according to the Poisson distribution in my work, but I did not have the information in Japanese and implemented it with reference to the following paper, so I will leave it as a memorandum.

[A more powerful test for comparing two Poisson means] (http://www.sciencedirect.com/science/article/pii/S0378375802004081)

It was said that there are two methods of testing, the conditional test (C-test) and the test using the P value proposed in this paper (E-test), but this time the detection power Implemented the higher E-test.

easy explanation

$ n1, n2 $: Number of elapsed unit times $ k1, k2 $: Total number of occurrences of events for the entire period $ d $: Difference between the means you want to test

When

Null hypothesis $ H_0: \ lambda_1 = \ lambda_2 + d $ Alternative hypothesis $ H_1: \ lambda_1 \ neq \ lambda_2 + d $

The P value when performing the hypothesis test of is calculated by the following formula.


\begin{align}
& \hat{\lambda}_{2k} = \frac{k_1 + k_2}{n_1 + n_2} - \frac{d n_1}{n_1 + n_2}\\[2mm]

& p = \sum_{x_1=0}^{k_1} \sum_{x_2=0}^{k_2}
\frac{e^{-n_1(\hat{\lambda}_{2k}+d)}(n_1(\hat{\lambda}_{2k} + d))^{x_1}}{x_1!}
\frac{e^{-n_2\hat{\lambda}_{2k}}(n_2(\hat{\lambda}_{2k}))^{x_2}}{x_2!}
\end{align}

Implementation

How to use

The following is an implementation of the above expression in Python and Ruby. For example, if the counts you want to compare are 40 and 65, respectively, and you want to know if there is a significant difference between the two mean values,

pois_mean_diff_test(40, 65)
=> 0.04921332025427114

Then you can calculate the P value. This time, we can see that the P value is below 0.05, so we can see that there is a significant difference at the 5% level.

important point

If the formula is implemented as it is, it will overflow in the floating point calculation, so the factorial calculation was decomposed and implemented as follows.

\frac{e^{-n_1(\hat{\lambda}_{2k}+d)}(n_1(\hat{\lambda}_{2k} + d))^{x_1}}{x_1!}
= e^{-n_1(\hat{\lambda}_{2k}+d)} \times
\frac{n_1(\hat{\lambda}_{2k} + d)}{x_1} \times \frac{n_1(\hat{\lambda}_{2k} + d)}{x_1 -1} \times \dots \times \frac{n_1(\hat{\lambda}_{2k} + d)}{1}

Implementation in Python

Python


import math
import numpy as np

def pois_mean_diff_test(k1, k2, n1=1, n2=1, d=0.0):

    x1_seq = range(0, k1 + 1)
    x2_seq = range(0, k2 + 1)
    l2k = (k1+k2)/(n1+n2) - d*n1/(n1+n2)

    p_value = sum([math.exp(-n1*(l2k+d)) * np.prod([n1*(l2k+d)/i for i in range(1, x1+1)]) * \
                   math.exp(-n2*l2k) * np.prod([n2*l2k/j for j in range(1, x2+1)]) \
                   for x2 in x2_seq for x1 in x1_seq])

    return p_value

if __name__ == '__main__':
    print "P-value is " + str(pois_mean_diff_test(40, 65))

Implementation in Ruby

Ruby


def pois_mean_diff_test(k1, k2, n1=1, n2=1, d=0.0, alpha=0.05)

  x1_seq = Array(0..k1)
  x2_seq = Array(0..k2)

  l2k = (k1+k2)/(n1+n2) - d*n1/(n1+n2)

  p_value = x1_seq.product(x2_seq).map{|x| x1=x[0]; x2=x[1];
    Math.exp(-n1*(l2k+d)) * Array(1..x1).map{|i| n1*(l2k+d)/i }.inject(:*).to_f *
    Math.exp(-n1*l2k) * Array(1..x2).map{|j| n2*l2k/j }.inject(:*).to_f
  }.inject(:+)

  return p_value
end

out = "P-value is " + pois_mean_diff_test(20,10).to_s
puts out

References

[K. Krishnamoorthy, Jessica Thomson, A more powerful test for comparing two Poisson means, Journal of Statistical Planning and Inference, Volume 119, Issue 1, 15 January 2004, Pages 23-35] (http://www.sciencedirect.com/science/article/pii/S0378375802004081)

Recommended Posts

Test of the difference between the mean values of count data according to the Poisson distribution
Test whether the observed data follow the Poisson distribution (Test of the goodness of fit of the Poisson distribution by Python)
Test the goodness of fit of the distribution
Bayesian modeling-estimation of the difference between the two groups-
Consideration of the difference between ROC curve and PR curve
Steps to calculate the likelihood of a normal distribution
How to use argparse and the difference between optparse
Automatically select BGM according to the content of the conversation
How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot
Switch the setting value of setting.py according to the development environment
How to test the attributes added by add_request_method of pyramid
Understand the difference between cumulative assignment to variables and cumulative assignment to objects
Various ways to calculate the similarity between data in python
[Homology] Count the number of holes in data with Python
The story of copying data from S3 to Google's TeamDrive
Change the volume of Pepper according to the surrounding environment (sound)
I sent the data of Raspberry Pi to GCP (free)
Try to extract the features of the sensor data with CNN