microsoft/MInference

[Question]: Why is every head config saved with "vertical_and_slash"?

fmmoret opened this issue · 1 comments

Describe the issue

Regardless of the pattern observed, the config saves it as "vertical_and_slash" when using the search_patterns function.

for ty, fc in [("stream_llm", stream_llm), ("vertical_and_slash", vertical_and_slash), ("block_sparse", block_sparse)]:
if ty == "stream_llm":
vs_list = [(100, 800)]
elif ty == "vertical_and_slash":
vs_list = [(30, 800), (100, 750), (500, 700), (3500, 100)]
else:
vs_list = [(8, 1)]
for v_size, s_size in vs_list:
score = fc(v_size, s_size)
score = score.item()
all_info.append([ty, v_size, s_size, score])
if score > best_score:
best_score = score
best_s, best_v = s_size, v_size
best_ty = ty
if best_ty == "stream_llm":
best_ty = "vertical_and_slash"
if best_ty == "block_sparse":
best_ty, best_v, best_s = "vertical_and_slash", 1000, 6096

The configs saved in the repo appear to only contain this method type ^.

Specifically those lines:

 if best_ty == "stream_llm": 
     best_ty = "vertical_and_slash" 
 if best_ty == "block_sparse": 
     best_ty, best_v, best_s = "vertical_and_slash", 1000, 6096 

When doing the forward pass, I think this means that we never route to anything other than the vertical_and_slash impl / kernels.

Is this a bug or intended? The experiment docs cite the use of this search patterns function.


On the other hand:

Search pattern v2

def search_pattern_v2(q, k, v, head):

does actually appear to save pattern type with specific names for routing to different pattern impls.


Do we need to use search patterns v2 to replicate the results of the paper? Or are the vertical_and_slash settings actually enough to pull off needle-in-a-haystack for long sequences?

Hi @fmmoret, thanks for your feedback.

The configs provided in the repo can reproduce the results from the paper. This means that the vertical_and_slash settings are sufficient to pass the Needle In A Haystack test for long sequences.

The search_pattern function reroutes to vertical_and_slash because our tests have shown that this setting offers better generalization and efficiency across different context windows and tasks.