_rvs_sampling.py 7.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172
  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. from scipy._lib._util import check_random_state
  4. def rvs_ratio_uniforms(pdf, umax, vmin, vmax, size=1, c=0, random_state=None):
  5. """
  6. Generate random samples from a probability density function using the
  7. ratio-of-uniforms method.
  8. Parameters
  9. ----------
  10. pdf : callable
  11. A function with signature `pdf(x)` that is proportional to the
  12. probability density function of the distribution.
  13. umax : float
  14. The upper bound of the bounding rectangle in the u-direction.
  15. vmin : float
  16. The lower bound of the bounding rectangle in the v-direction.
  17. vmax : float
  18. The upper bound of the bounding rectangle in the v-direction.
  19. size : int or tuple of ints, optional
  20. Defining number of random variates (default is 1).
  21. c : float, optional.
  22. Shift parameter of ratio-of-uniforms method, see Notes. Default is 0.
  23. random_state : {None, int, `numpy.random.Generator`,
  24. `numpy.random.RandomState`}, optional
  25. If `seed` is None (or `np.random`), the `numpy.random.RandomState`
  26. singleton is used.
  27. If `seed` is an int, a new ``RandomState`` instance is used,
  28. seeded with `seed`.
  29. If `seed` is already a ``Generator`` or ``RandomState`` instance then
  30. that instance is used.
  31. Returns
  32. -------
  33. rvs : ndarray
  34. The random variates distributed according to the probability
  35. distribution defined by the pdf.
  36. Notes
  37. -----
  38. Given a univariate probability density function `pdf` and a constant `c`,
  39. define the set ``A = {(u, v) : 0 < u <= sqrt(pdf(v/u + c))}``.
  40. If `(U, V)` is a random vector uniformly distributed over `A`,
  41. then `V/U + c` follows a distribution according to `pdf`.
  42. The above result (see [1]_, [2]_) can be used to sample random variables
  43. using only the pdf, i.e. no inversion of the cdf is required. Typical
  44. choices of `c` are zero or the mode of `pdf`. The set `A` is a subset of
  45. the rectangle ``R = [0, umax] x [vmin, vmax]`` where
  46. - ``umax = sup sqrt(pdf(x))``
  47. - ``vmin = inf (x - c) sqrt(pdf(x))``
  48. - ``vmax = sup (x - c) sqrt(pdf(x))``
  49. In particular, these values are finite if `pdf` is bounded and
  50. ``x**2 * pdf(x)`` is bounded (i.e. subquadratic tails).
  51. One can generate `(U, V)` uniformly on `R` and return
  52. `V/U + c` if `(U, V)` are also in `A` which can be directly
  53. verified.
  54. The algorithm is not changed if one replaces `pdf` by k * `pdf` for any
  55. constant k > 0. Thus, it is often convenient to work with a function
  56. that is proportional to the probability density function by dropping
  57. unneccessary normalization factors.
  58. Intuitively, the method works well if `A` fills up most of the
  59. enclosing rectangle such that the probability is high that `(U, V)`
  60. lies in `A` whenever it lies in `R` as the number of required
  61. iterations becomes too large otherwise. To be more precise, note that
  62. the expected number of iterations to draw `(U, V)` uniformly
  63. distributed on `R` such that `(U, V)` is also in `A` is given by
  64. the ratio ``area(R) / area(A) = 2 * umax * (vmax - vmin) / area(pdf)``,
  65. where `area(pdf)` is the integral of `pdf` (which is equal to one if the
  66. probability density function is used but can take on other values if a
  67. function proportional to the density is used). The equality holds since
  68. the area of `A` is equal to 0.5 * area(pdf) (Theorem 7.1 in [1]_).
  69. If the sampling fails to generate a single random variate after 50000
  70. iterations (i.e. not a single draw is in `A`), an exception is raised.
  71. If the bounding rectangle is not correctly specified (i.e. if it does not
  72. contain `A`), the algorithm samples from a distribution different from
  73. the one given by `pdf`. It is therefore recommended to perform a
  74. test such as `~scipy.stats.kstest` as a check.
  75. References
  76. ----------
  77. .. [1] L. Devroye, "Non-Uniform Random Variate Generation",
  78. Springer-Verlag, 1986.
  79. .. [2] W. Hoermann and J. Leydold, "Generating generalized inverse Gaussian
  80. random variates", Statistics and Computing, 24(4), p. 547--557, 2014.
  81. .. [3] A.J. Kinderman and J.F. Monahan, "Computer Generation of Random
  82. Variables Using the Ratio of Uniform Deviates",
  83. ACM Transactions on Mathematical Software, 3(3), p. 257--260, 1977.
  84. Examples
  85. --------
  86. >>> import numpy as np
  87. >>> from scipy import stats
  88. >>> rng = np.random.default_rng()
  89. Simulate normally distributed random variables. It is easy to compute the
  90. bounding rectangle explicitly in that case. For simplicity, we drop the
  91. normalization factor of the density.
  92. >>> f = lambda x: np.exp(-x**2 / 2)
  93. >>> v_bound = np.sqrt(f(np.sqrt(2))) * np.sqrt(2)
  94. >>> umax, vmin, vmax = np.sqrt(f(0)), -v_bound, v_bound
  95. >>> rvs = stats.rvs_ratio_uniforms(f, umax, vmin, vmax, size=2500,
  96. ... random_state=rng)
  97. The K-S test confirms that the random variates are indeed normally
  98. distributed (normality is not rejected at 5% significance level):
  99. >>> stats.kstest(rvs, 'norm')[1]
  100. 0.250634764150542
  101. The exponential distribution provides another example where the bounding
  102. rectangle can be determined explicitly.
  103. >>> rvs = stats.rvs_ratio_uniforms(lambda x: np.exp(-x), umax=1,
  104. ... vmin=0, vmax=2*np.exp(-1), size=1000,
  105. ... random_state=rng)
  106. >>> stats.kstest(rvs, 'expon')[1]
  107. 0.21121052054580314
  108. """
  109. if vmin >= vmax:
  110. raise ValueError("vmin must be smaller than vmax.")
  111. if umax <= 0:
  112. raise ValueError("umax must be positive.")
  113. size1d = tuple(np.atleast_1d(size))
  114. N = np.prod(size1d) # number of rvs needed, reshape upon return
  115. # start sampling using ratio of uniforms method
  116. rng = check_random_state(random_state)
  117. x = np.zeros(N)
  118. simulated, i = 0, 1
  119. # loop until N rvs have been generated: expected runtime is finite.
  120. # to avoid infinite loop, raise exception if not a single rv has been
  121. # generated after 50000 tries. even if the expected numer of iterations
  122. # is 1000, the probability of this event is (1-1/1000)**50000
  123. # which is of order 10e-22
  124. while simulated < N:
  125. k = N - simulated
  126. # simulate uniform rvs on [0, umax] and [vmin, vmax]
  127. u1 = umax * rng.uniform(size=k)
  128. v1 = rng.uniform(vmin, vmax, size=k)
  129. # apply rejection method
  130. rvs = v1 / u1 + c
  131. accept = (u1**2 <= pdf(rvs))
  132. num_accept = np.sum(accept)
  133. if num_accept > 0:
  134. x[simulated:(simulated + num_accept)] = rvs[accept]
  135. simulated += num_accept
  136. if (simulated == 0) and (i*N >= 50000):
  137. msg = ("Not a single random variate could be generated in {} "
  138. "attempts. The ratio of uniforms method does not appear "
  139. "to work for the provided parameters. Please check the "
  140. "pdf and the bounds.".format(i*N))
  141. raise RuntimeError(msg)
  142. i += 1
  143. return np.reshape(x, size1d)