[python] Random number generation memorandum

Whenever I try to generate random numbers, I can't remember which function to use, so I've put together a memorandum of random number generation functions that are likely to be used frequently.

In particular, the random numbers generated from the various probability distributions in the latter half are shown with graphs and images, so I think it will be useful for understanding the probability distribution itself. In particular, I was worried about the chi-square distribution because I didn't know the image in the past, so I tried to explain it intuitively.

Below, it is described on the assumption that the following libraries are imported.

import numpy as np
import numpy.random as rd
import scipy.stats as st
import matplotlib.pyplot as plt

Uniformly distributed random number

rand(d0, d1, ..., dn)###

x = rd.rand(2, 3)
print x

result


[[ 0.49748253  0.88897543  0.65014384]
 [ 0.68424239  0.19667014  0.83407881]]

Generate a uniform distribution of [0, 1). The number of elements in the dimension of the generated random number can be specified in the argument. In the above example, it is 2 rows and 3 columns. If there is no argument, one random number is generated.

randn(d0, d1, ..., dn) ###

x1 = rd.randn(2, 4)
print x1

x2 = 2.5 * rd.randn(3, 3) + 3
print x2

result


[[-0.42016216  0.41704326 -0.93713613  0.23174941]
 [-0.95513093  1.00766086 -0.5724616   1.32460314]]

[[-1.51762436  4.88306835  3.21346622]
 [ 0.93229257  4.0592773   4.99599127]
 [ 3.77544739 -0.20112058  2.47063097]]

Generates random numbers generated from a normal distribution with mean 0 and standard deviation 1. The number of elements in the dimension of the generated random number can be specified in the argument. In the above example, it is 2 rows and 4 columns. If there is no argument, one random number is generated. If you want to specify the mean and standard deviation, write as sigma * rd.randn () + mu.

randint(low, high=None, size=None) ###

x = rd.randint(low=0, high=5, size=10)
print x
li = np.array([u"Math", u"Science", u"society", u"National language", u"English"])

for l in li[x]:
    print l

result


[2 0 1 0 0 0 1 3 1 4]
society
Math
Science
Math
Math
Math
Science
National language
Science
English

Generates an integer random number generated from the discrete uniform distribution in the range specified by the argument. high and size can be omitted. Note that if high is not omitted, the range [0, low) is set, and if high is described, the range [low, high) is set, and both do not include the upper limit value.

This is useful when you want to randomly extract some elements from a certain array.

random_integers(low, high=None, size=None) ###

x = rd.random_integers(low=1, high=10, size=(2,5))
print x

dice = rd.random_integers(1, 6, 100) #Simulation of rolling the dice 100 times
print dice

result


[[10  5  7  7  8]
 [ 3  5  6  9  6]]

[4 5 2 2 1 1 6 4 5 5 5 5 1 5 1 1 3 2 4 4 5 3 6 6 3 3 5 3 6 1 1 4 1 1 2 1 1
 5 1 6 6 6 6 2 6 3 4 5 1 6 3 1 2 6 1 5 2 3 4 4 3 1 2 1 1 3 5 2 2 1 4 1 6 6
 2 5 4 3 2 1 4 1 2 4 2 5 3 3 1 4 4 1 6 4 1 1 3 6 1 6]

Like randint (), it generates an integer random number generated from the discrete uniform distribution in the range specified by the argument. high and size can be omitted. The main difference is in the range, if high is not omitted, the range is set to [1, low], if high is described, the range is set to [low, high], and only low is specified as "including the upper limit". Where the adjustment of the case is "1".

random_sample(size=None), random(size=None), ranf(size=None), sample(size=None) ###

x = np.random.random_sample((4,3))
print x

result


[[ 0.613437    0.38902499  0.91052787]
 [ 0.80291265  0.81324739  0.06631052]
 [ 0.62305967  0.44327718  0.2650803 ]
 [ 0.76565352  0.42962876  0.40136025]]

As the title suggests, there are four types, http://stackoverflow.com/questions/18829185/difference-between-various-numpy-random-functions According to, they are all the same (other than random_sample are aliases). What the hell (laughs) The difference from rand () is that the way to specify the arguments is that these are specified by tuples, but rand () is the way to specify that there are multiple arguments themselves.

choice(a, size=None, replace=True, p=None) ###

x1=rd.choice(5, 5, replace=False )     # 0-Equivalent to sorting 4
print x1

x2=rd.choice(5, 5, p=[0.1,0.1,0.1,0.1,0.6]) #High probability of getting 4
print x2

result


[1 4 2 3 0]
[4 4 4 4 2]

The structure of the argument of choice is choice (a, size = None, replace = True, p = None). a represents a random number selection from range (a). Generates the number of random numbers specified by size. Although replace is characteristic, it is considered to be sampled from range (a), but if True is specified, a random number will be generated without returning the extracted number. The same thing is not twice. Therefore, if the value of a is smaller than size, an error will occur. Although p is also characteristic, you can specify the probability of occurrence of each number instead of a uniform random number. Therefore, if the sizes of a and p are not the same, an error will occur.

Other random numbers so far are returned by python standard list, but this is returned by numpy's ndarray.

shuffle(x) ###

x = range(10)
rd.shuffle(x)
print x

result


[3, 4, 2, 5, 8, 9, 6, 1, 7, 0]

A function that randomly shuffles the order of arrays. Note that it modifies the array itself given as an argument, rather than returning it as a return value.

permutation(x) ###

x1 = rd.permutation(10)
print x1

li = ['cat', 'dog', 'tiger', 'lion', 'elephant']
x2 = rd.permutation(li)
print x2

result


[4 0 6 5 3 8 7 1 9 2]

['elephant' 'tiger' 'lion' 'dog' 'cat']

If an int type variable is specified as an argument, range (a) is generated internally and it is sorted randomly. If list is specified as an argument, the elements will be sorted randomly. The value in list is not a numerical value but can be a list such as a character string.

uniform(low=0.0, high=1.0, size=None) ###

x = rd.uniform(-2,5,30)
print x

result


[-1.79969471  0.6422639   4.36130597 -1.99694629  3.23979431  4.75933857
  1.39738979  0.12817182  1.64040588  3.0256498   0.14997201  2.0023698
  3.76051422 -1.80957115 -0.2320044  -1.82575799  1.26600285 -0.27668411
  0.77422678  0.71193145 -1.42972204  4.62962696 -1.90378575  1.84045518
  1.06136363  4.83948262  3.57364714  1.73556559 -0.97367223  3.84649039]

Generates random numbers generated from a uniform distribution. The difference from the uniform distribution system random number generation function explained so far is that the range can be specified. The argument structure is (low = 0.0, high = 1.0, size = None), and a half-open interval with an empty top such as [low, high).

Probability distribution model random number

binomial (n, p, size = None): Binomial distribution

x = rd.binomial(10, 0.5, 20)
print x

result


[5 4 5 5 4 3 8 3 6 6 3 4 5 1 5 7 6 4 2 6]

Generate a random number generated from the binomial distribution when the success probability p is tried n times. The following can be thought of as a histogram when a probability of 0.5 coin toss is performed 30 times and the number of times is noted, which is performed 3000 times.

x = rd.binomial(30, 0.5, 3000)
plt.hist(x, 17)

binomial-compressor.png

poisson (lam = 1.0, size = None): Poisson distribution

x = rd.poisson(30, 20)
print x

result


[25 31 38 20 36 29 28 31 22 31 27 24 24 26 32 42 27 20 30 31]

Random numbers are generated from the Poisson distribution, which occurs lam times per unit time. Taking the click rate of a certain ad as an example, it is applied to the case where the ad is clicked 30 times an hour.

The following can be considered as a histogram when an average of 5 clicks per hour is tried 1000 times (= 1000 hours of data is taken).

x = rd.poisson(5, 1000)
plt.hist(x, 14)

poisson.png

hypergeometric (ngood, nbad, nsample, size = None): Hypergeometric distribution

ngood, nbad, nsamp = 90, 10, 10
x = rd.hypergeometric(ngood, nbad, nsamp, 100)
print x
print np.average(x)

result


[ 9 10  8  9  8  7  7  9 10  7 10  9  9  8  9  9  9  9  8 10  5 10  9  9  9
  9  9 10 10  8 10  9  9  9  7  9  9 10 10  7  9  9 10 10  8  9 10 10  8 10
 10  9  9 10  9 10  8  9  9  9  8  9 10  9 10 10 10  9  9  9 10  9  8 10  7
  7 10 10  9 10 10  9 10  9  7  9  9  8  8 10  7  8  9 10  9  9 10  9  8 10]
8.97

Generate a random number generated from a hypergeometric distribution. For example, there are ngood good products and nbad defective products, and the number of good products that can be taken out when nsamp pieces are extracted by the defect rate survey is returned.

The graph below shows the number of non-defective products obtained by sampling 20 products when there are 190 non-defective products and 10 defective products (that is, a defect rate of 5%) in a collection box containing 200 products. It can be thought of as a histogram of the data when this is done for 3000 assembly boxes (which contain exactly the same number of good and defective products).

ngood, nbad, nsamp = 190, 10, 20
x = rd.hypergeometric(ngood, nbad, nsamp, 3000)
plt.hist(x, 6)

hyper-compressor.png

geometric (p, size = None): Geometric distribution

x = rd.geometric(p=0.01, size=100)
print x

result


[294  36  25  18 171  24 145 280 132  15  65  88 180 103  34 105   3  34
 111 143   5  26 204  27   1  24 442 213  25  93  97  28  80  93   6 189
  90  31 213  13 124  50 110  47  45  66  21   1  88  79 332  80  32  19
  17   2  38  62 121 136 175  81 115  82  35 136  49 810 302  31 147 207
  80 125  33  53  32  98 189   4 766  72  68  10  23 233  14  21  61 362
 179  56  13  55   2  48  41  54  39 279]

Generate a random number generated from the geometric distribution. It returns a random number of the number of times that success will occur when the trial with success probability p is repeated until it succeeds.

The graph below repeats the trial with a probability of 1% until it succeeds, and notes the number of times until success. It is considered that the data when it was repeated 1000 times was made into a histogram.

x = rd.geometric(p=0.01, size=1000)
plt.hist(x, 30)

geo-compressor.png

normal (loc = 0.0, scale = 1.0, size = None): Normal distribution

x = np.random.normal(5, 2, 20)
print x

result


[-0.28713217  2.07791879  2.48991635  5.36918301  4.32797397  1.40568929
  6.36821312  3.22562844  4.16203214  3.91913171  6.26830012  4.74572788
  4.78666884  6.76617469  5.05386902  3.20053316  9.04530241  5.71373444
  5.95406987  2.61879994]

It generates random numbers generated from the normal distribution, which is the royal road of the probability distribution. loc is the mean and scale is the standard deviation. Below is the histogram.

x = np.random.normal(5, 2, 10000)
plt.hist(x,20)

norm-compressor.png

By the way, the random numbers generated from the chi-square distribution introduced below can be created from the combination of random numbers generated from this normal distribution and the square.

#Average 0,Generate a random number generated from a normal distribution with a standard deviation of 1 and square its value
x1 = np.random.normal(0, 1, 10000)**2
x2 = np.random.normal(0, 1, 10000)**2
x3 = np.random.normal(0, 1, 10000)**2
x4 = np.random.normal(0, 1, 10000)**2
x5 = np.random.normal(0, 1, 10000)**2
x6 = np.random.normal(0, 1, 10000)**2

#Adding two random numbers generated from the squared normal distribution gives a chi-square distribution with one degree of freedom (blue graph).
plt.hist(x1+x2,20, color='b')
plt.show()
#Chi-square distribution with 2 degrees of freedom when 3 are added (green graph)
plt.hist(x1+x2+x3,20, color='g')
plt.show()
#Add 6 more to chi-square distribution with 5 degrees of freedom (red graph)
plt.hist(x1+x2+x3+x4+x5+x6,20, color='r')
plt.show()

sum_norm1-compressor.png sum_norm2-compressor.png sum_norm3-compressor.png

chisquare (df, size = None): Chi-square distribution

x = rd.chisquare(3, 20)
print x

result


[ 0.69372667  0.94576453  3.7221214   6.25174061  3.07001732  1.14520278
  0.92011307  0.46210561  4.16801678  5.89167331  2.57532324  2.07169671
  3.91118545  3.12737954  1.02127029  0.69982098  1.27009033  2.25570581
  4.66501179  2.06312544]

Returns a random number generated from a chi-square distribution with df degrees of freedom. As mentioned in the section above, the chi-square distribution is a distribution that follows the square of the random numbers generated from the standard normal distribution and the sum of them.

#2 degrees of freedom, 5,Histogram of random numbers generated by a chi-square distribution according to 20
for df, c in zip([2,5,20], "bgr"):
    x = rd.chisquare(df, 1000)
    plt.hist(x, 20, color=c)
    plt.show()

chisq1-compressor.png chisq2.png chisq3.png

f (dfnum, dfden, size = None): F distribution

x = rd.f(6, 28, 30)
print x

result


[ 0.54770358  0.90513244  1.32533065  0.75125196  1.000936    1.00622822
  1.18431869  0.73399399  0.6237275   1.51806607  1.12040041  1.67777055
  0.40309609  0.29640278  0.49408306  1.97680072  0.51474868  0.28782202
  0.90206995  0.30968917  1.29931934  1.19406178  1.28635087  2.73510067
  0.41310779  1.36155992  0.2887777   0.78830371  0.25557871  0.96761269]

Returns a random number generated from an F distribution with two degrees of freedom dfnum and dfden. This F distribution is a probability distribution consisting of random variables that follow two independent chi-square distributions in the molecule and denominator (each divided by the degree of freedom). The chi-square distribution can be regarded as a variance in the sense that it is normalized and squared, so it is used to test that the two variances are the same.

The graph below is a histogram of random numbers generated from an F distribution with degrees of freedom (1,4), (5,7), (10,10), and (40,50), respectively.

for df, c in zip([(1,4), (5,7), (10,10), (40,50)], "bgry"):
    x = rd.f(df[0], df[1], 1000)
    plt.hist(x, 100, color=c)
    plt.show()

f1-compressor.png f2-compressor.png f3-compressor.png f4-compressor.png

exponential (scale = 1.0, size = None): Exponential distribution

lam = 0.1   #0 per minute.Occurs once.
x = rd.exponential(1./lam, size=20)
print x

result


[ 11.2642272   41.01507264  11.5756986   27.10318556  10.7079342
   0.17961819  24.49974467   6.46388826   9.69390641   2.85354527
   0.55508868   4.04772073  24.60029857  23.10866     19.83649067
  12.12219301  10.24395203   0.16056754   8.9401544    8.86083473]

Returns a random number generated from an exponential distribution with the parameter lam. lam is a parameter that indicates the average number of times that occurs in a unit time. When setting to exponential, set the reciprocal of lam to scale. exponential returns a random number indicating how many unit times it took for the next occurrence of an event that occurs an average of lam times in a unit time.

In other words, if there is an event that occurs 0.1 times on average in 1 minute, and if it is 3, it means that it happened 3 minutes later. The graph below is a histogram of the random numbers generated from the exponential distribution when lam = 0.1.

lam = 0.1  #0 per minute.Occurs once.
x = rd.exponential(1./lam, size=10000)
plt.hist(x, 100)

exp-compressor.png

Referenced site

numpy reference site http://docs.scipy.org/doc/numpy/reference/routines.random.html

Recommended Posts

[python] Random number generation memorandum
Random string generation (Python)
Python memorandum
Python Memorandum 2
Python memorandum
python memorandum
python memorandum
Python memorandum
python memorandum
Python memorandum
random French number generator with python
Prime number generation program by Python
Pseudo-random number generation and random sampling
Non-overlapping integer random number generation (0-N-1)
Random number generation summary by Numpy
Python basics memorandum
Python pathlib memorandum
Python memorandum (algorithm)
#Random string generation
ABC memorandum [ABC157 C --Guess The Number] (Python)
Python memorandum [links]
Blender 2.8, Python, camera movement, random number color specification
High-dimensional random number vector generation ~ Latin Hypercube Sampling / Latin hypercube sampling ~
Python memorandum numbering variables
python memorandum (sequential update)
[Note] Random number creation?
Python memorandum (personal bookmark)
Python basic memorandum part 2
[Python] Iterative processing_Personal memorandum
Prime number 2 in Python
Memorandum @ Python OR Seminar
python memorandum super basic
I made a prime number generation program in Python
python note: What does it mean to set a seed with random number generation?
Balanced Random Forest in python
Cisco Memorandum _ Python config input
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 12 [12/100]
Effective Python Learning Memorandum Day 9 [9/100]
Numpy random module random number generator
Effective Python Learning Memorandum Day 8 [8/100]
ABC memorandum [ABC163 C --managementr] (Python)
About python beginner's memorandum function
Memorandum @ Python OR Seminar: matplotlib
[Python] SQLAlchemy error avoidance memorandum
ABC161D Lunlun Number with python3
A memorandum about correlation [Python]
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Memorandum @ Python OR Seminar: Pulp
Effective Python Learning Memorandum Day 13 [13/100]
A memorandum about Python mock
Effective Python Learning Memorandum Day 5 [5/100]
Memorandum @ Python OR Seminar: Pandas
Effective Python Learning Memorandum Day 4 [4/100]
Weighted random choice in python
Memorandum @ Python OR Seminar: scikit-learn
Effective Python Learning Memorandum Day 7 [7/100]
Effective Python Learning Memorandum Day 2 [2/100]
python parallel / asynchronous execution memorandum
Number recognition in images with Python