Issues
- 0
torchscale 0.3.0 requires fairscale==0.4.0, but you have fairscale 0.4.13 which is incompatible.
#110 opened by pandayummy - 0
- 0
Question about LongNet attention map overlap
#108 opened by RmZeta2718 - 0
- 0
How to test the model
#106 opened by ReloJeffrey - 0
pip error
#105 opened by wanghaoran-ucas - 0
can't use longvit
#103 opened by abebe9849 - 0
- 0
- 0
How to use retention in RetNet for cross-attention?
#101 opened by yxchng - 0
Checkpoint for RetNet
#99 opened by macsz - 1
What WSI level was used for pretraining LongVit?
#98 opened by jpfeil - 0
about attention mask
#97 opened by hichoe95 - 2
about the longnet's ppl
#95 opened by robotzheng - 2
- 1
Training RetNet on A100 GPUs
#83 opened by Antoine-Bergerault - 1
Wrong Rnm Normalization.
#86 opened by pdradx - 2
Introducing padding_mask to RetNet
#85 opened by xtwigs - 2
Question about the normalization in attention
#81 opened by Cranial-XIX - 2
Question about RetNetRelPos
#80 opened by hyunwoongko - 3
initialization of qkv
#68 opened by XintianHan - 1
- 0
[Minor issue] Discrepancy inside arxiv paper
#82 opened by radarFudan - 2
about gamma/decay in RetNet
#79 opened by rouniuyizu - 7
Chunk recurrent representation incorrect results
#77 opened by N0r9st - 4
embed_tokens
#59 opened by CodeMiningCZW - 1
Compatibility with torchsummary
#71 opened by lzqlzzq - 2
About training memory
#75 opened by HoraceXIaoyiBao - 3
RuntimeError: The size of tensor a (5) must match the size of tensor b (2) at non-singleton dimension 0
#72 opened by codinglover0111 - 1
- 6
retnet traning config
#64 opened by hanlinxuy - 3
There're a confusion in torchscale
#65 opened by lovekang3344 - 2
pip package does not contain RetNet
#67 opened by fabienGenhealth - 3
AttributeError: 'EncoderDecoderConfig' object has no attribute 'normalize_output'
#73 opened by Yuki2L0ve - 4
BEiT3 Vision-Language Expert question
#74 opened by andreapdr - 9
- 5
RetNet: relative position
#49 opened by fkodom - 1
- 2
- 1
Could you please explain the reason behind defining TEMPERATURE_FOR_L_UAX in the code without actually using it?
#63 opened by Ruiyuan-Zhang - 4
`get_moe_group` 's return is None, when building `class MOELayer(Base)` , using one gpu
#60 opened by Ruiyuan-Zhang - 2
- 1
Training & Inference examples for RetNet
#52 opened by jhl-Det - 2
Retnet training is slow
#55 opened by Zth9730 - 2
Question about is_first_step and Retnet
#58 opened by tdomhan - 2
Retnet parameter dimension
#57 opened by allanj - 2
- 3
- 1
- 6
scale.sqrt() in the recurrent_forward function of the multiscale_retention module
#47 opened by wangmengzhi