NVlabs/timeloop

Setting random number generator seed

suyashbakshi opened this issue · 10 comments

Hello, since the mapping search space sizes are usually very large, I'm trying to increase the "spread" of the random search algorithm in the search space so that it hopefully finds large number of valid mappings, but I also want to reduce the time it takes to collect large number of valid mappings.
In that spirit, I intend to submit several slurm jobs and, in each job set different random number generator seeds for the random search algorithm, so that each job "hopefully" traverses a fairly spread out localities in the search space. My question is, is it possible somehow to set the seed for random number generator in Timeloop to achieve this?

That's a great idea. I think the cleanest way to do this would be to add a "random-seed" key to the mapper YAML config. You would read the key in the RandomPruned constructor. You should have a default seed if the user doesn't specify one. I think this is the cleanest way to do it, but it needs the seed-reading code to be replicated in every search heuristic that needs this logic.

Alternatively, you could add it to the search module factory here. In fact, I am leaning towards this (even though it breaks abstraction because it would set the seed for heuristics that do not have a random element).

Thank you. So I'm guessing this would also involve passing the seed to "RandomGenerator128" here maybe? and then use it in "Next()", here

You would pass the seed in to the RandomGenerator128 constructor, and pass it through to the engine_ object's constructor (the existing code just uses the default constructor for that object). Once you do that, I don't think you need to change anything in Next() because the engine_ is already used for each RNG call, which should reflect the seed you passed in during construction.

Thanks a lot! I was able to achieve the intended effect. Now just need to check the overlap between mappings found across different seeds. Thank you for the quick responses.

Please let us know what you see. If this ends up being useful for you, please consider contributing the feature to the project. I am sure others will benefit.

I have a quick question in regards to determining mappings uniquely for this task. What property other than the following 4 are needed to uniquely identify a mapping? I think I'm missing something, since I found a few mappings that have the same index factorization but different loop permutation, but these 4 properties were identical for them.

mapping_id[int(mapspace::Dimension::IndexFactorization)];
mapping_id[int(mapspace::Dimension::LoopPermutation)];
mapping_id[int(mapspace::Dimension::Spatial)];
mapping_id[int(mapspace::Dimension::DatatypeBypass)];

That's it -- those 4 integers should uniquely identify a mapping.

I'm observing a weird output in that case. For example the following two mappings:

[  7] | mapping_id = Index_fact_id: 4200856093, perm_id: 4435339500, spatial_id: 9, bypass_id: 0 | Utilization = 0.16 | pJ/Compute =   63.765 | L2[WIO] C7 F14 S7 T7 K8 Q7 P7 R7 - L1[WIO] Q2 P4 K4X C4X F4X Q2X - L0[WIO] K2 Q2 P2
[  3] | mapping_id = Index_fact_id: 4200856093, perm_id: 4435339500, spatial_id: 9, bypass_id: 0 | Utilization = 0.16 | pJ/Compute =   20.136 | L2[WIO] C7 F14 T7 K8 Q7 P7 R7 - L1[WIO] S7 Q2 P4 K4X C4X F4X Q2X - L0[WIO] K2 Q2 P2

The only difference between these 2 mappings is the location of "S7". But they seem to have identical IDs. I'm running the mapper with multiple threads, does that have any effect on how mapping IDs are constructed?

Ah yes, sorry. When you create a mapper with multiple threads, the IndexFactorization mapspace gets split and divided up between the threads. The mapping ID that each thread sees is within the local mapspace partition it is assigned. So my prior response was actually incorrect -- in addition to those 4 integers, you also need the thread-id to uniquely identify a mapping.

Got it. So, say a tuple containing the thread-id and the 4 integers will be unique to each mapping.