argiopetech/base

Add MT warmup

Closed this issue · 5 comments

Per Good Practice in (Pseudo) Random Number Generation for
Bioinformatics Applications
, the Mersenne Twister's randomness properties can suffer if seeded with a simple seed (from the binary viewpoint, the more zeroes there are starting from the MSB moving toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding. Additionally, generate and discard between several hundred and several thousand values to "warm up" the PRNG.

good thought. Would there still be a way to get the same result twice
if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for
Bioinformatics Applications
http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the Mersenne
Twister's randomness properties can suffer if seeded with a simple seed (from
the binary viewpoint, the more zeroes there are starting from the MSB moving
toward the LSB, the simpler the value).

    Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding.
Additionally, generate and discard between several hundred and several thousand
values to "warm up" the PRNG.


Reply to this email directly or view it on GitHub
#38.

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751

The proposed method should still give deterministic results as long as we
don't change hash functions or change the number of warmup iterations. I
intend these both to be hard-coded.
On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice
if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for
Bioinformatics Applications
http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the
Mersenne
Twister's randomness properties can suffer if seeded with a simple seed
(from
the binary viewpoint, the more zeroes there are starting from the MSB
moving
toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to
seeding.
Additionally, generate and discard between several hundred and several
thousand
values to "warm up" the PRNG.


Reply to this email directly or view it on GitHub
#38.

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-21413135
.

OK. And is there a way to start a run off differently in case one wants
to do that?

On 7/23/13 11:25 AM, Elliot Robinson wrote:

The proposed method should still give deterministic results as long as we
don't change hash functions or change the number of warmup iterations. I
intend these both to be hard-coded.
On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice
if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for
Bioinformatics Applications
http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the
Mersenne
Twister's randomness properties can suffer if seeded with a simple seed
(from
the binary viewpoint, the more zeroes there are starting from the MSB
moving
toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to
seeding.
Additionally, generate and discard between several hundred and several
thousand
values to "warm up" the PRNG.


Reply to this email directly or view it on GitHub
#38.

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751


Reply to this email directly or view it on
GitHubhttps://github.com//issues/38#issuecomment-21413135
.


Reply to this email directly or view it on GitHub
#38 (comment).

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751

The current --seed CLI flag and the seed: YAML field will remain as they
are, they'll just be mapped to a (hopefully) more complex number internally.


Elliot Robinson
Email: elliot.robinson@argiopetech.com
Phone: (321) 252-9660

On Tue, Jul 23, 2013 at 11:52 AM, tedvh notifications@github.com wrote:

OK. And is there a way to start a run off differently in case one wants
to do that?

On 7/23/13 11:25 AM, Elliot Robinson wrote:

The proposed method should still give deterministic results as long as we
don't change hash functions or change the number of warmup iterations. I
intend these both to be hard-coded.
On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice
if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for
Bioinformatics Applications
http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the
Mersenne
Twister's randomness properties can suffer if seeded with a simple
seed
(from
the binary viewpoint, the more zeroes there are starting from the MSB
moving
toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to
seeding.
Additionally, generate and discard between several hundred and
several
thousand
values to "warm up" the PRNG.


Reply to this email directly or view it on GitHub
#38.

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751


Reply to this email directly or view it on
GitHub<
https://github.com/argiopetech/base/issues/38#issuecomment-21413135>

.


Reply to this email directly or view it on GitHub
#38 (comment).

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-21424164
.

ah, gotcha.

On 7/23/13 12:10 PM, Elliot Robinson wrote:

The current --seed CLI flag and the seed: YAML field will remain as they
are, they'll just be mapped to a (hopefully) more complex number internally.


Elliot Robinson
Email: elliot.robinson@argiopetech.com
Phone: (321) 252-9660

On Tue, Jul 23, 2013 at 11:52 AM, tedvh notifications@github.com wrote:

OK. And is there a way to start a run off differently in case one wants
to do that?

On 7/23/13 11:25 AM, Elliot Robinson wrote:

The proposed method should still give deterministic results as long as we
don't change hash functions or change the number of warmup iterations. I
intend these both to be hard-coded.
On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice
if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for
Bioinformatics Applications
http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the
Mersenne
Twister's randomness properties can suffer if seeded with a simple
seed
(from
the binary viewpoint, the more zeroes there are starting from the MSB
moving
toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to
seeding.
Additionally, generate and discard between several hundred and
several
thousand
values to "warm up" the PRNG.


Reply to this email directly or view it on GitHub
#38.

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751


Reply to this email directly or view it on
GitHub<
https://github.com/argiopetech/base/issues/38#issuecomment-21413135>

.


Reply to this email directly or view it on GitHub
#38 (comment).

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751


Reply to this email directly or view it on
GitHubhttps://github.com//issues/38#issuecomment-21424164
.


Reply to this email directly or view it on GitHub
#38 (comment).

Ted von Hippel

Department of Physical Sciences
Embry-Riddle Aeronautical University
600 S. Clyde Morris Boulevard
Daytona Beach, FL 32114-3900
386-226-7751