Random problems with R

Philip B. Stark

Random problems with R

Abstract

R (Version 3.5.1 patched) has an issue with its random sampling functionality. R generates random integers between 1 and m by multiplying random floats by m, taking the floor, and adding 1 to the result. Well-known quantization effects in this approach result in a non-uniform distribution on \ 1, …, m\. The difference, which depends on m, can be substantial. Because the sample function in R relies on generating random integers, random sampling in R is biased. There is an easy fix: construct random integers directly from random bits, rather than multiplying a random float by m. That is the strategy taken in Python's numpy.random.randint() function, among others. Example source code in Python is available at https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py (see functions getrandbits() and randbelowfromrandbits()).

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…