Pinned Repositories
-StartTransformers
🌱StartTransformer_1 is a new transformer structure build with time-wise normalization and a new way to allocate params for FFN in order to train a transformer-kind structure with much lower params stably and its basic idea can be used on developing a lot of another stuctructures
ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Cut-Shortcut
Cut-shortcut: I consided GNN and other model may use wrong shortcut instead of really learned a good representation, so I add some random-projected target information to a projected matrix in my model and minimize the similarity between the output and the model without target information.My experiment shows it works and it is possible to adopt this
DeepSpeed-Compress-comm
DeepSpeed-Compress-comm using inverse FFT and a new kind of diffusion training to compress Tensors in all_reduce in muti-gpu inference to accelerate the speed.
GFNet-Pytorch
A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 20% on an iPhone XS Max without sacrificing accuracy.
Kevin-shihello-world
Config files for my GitHub profile.
StartBert
StartTransformer_0
🌱StartTransformer is a new transformer structure build with time-wise normalization and a new way to allocate params for FFN in order to train a transformer-kind structure with much lower params stably and its basic idea can be used on developing a lot of another stuctructures
Up-DownFormer
Up-DownFormer: This kind of transformer architecture is mostly a newly decided GNN decided in this work, And I've tested this kind of gene and on normal GNN test and get superior result and thishe whole new transformer architecture on a NLP task and it got comparable result as the formal all self-attention ones with much lower computation
vllm-compress-comm
vllm-compress-comm use inverse FFT and a new kind of training strategy on training of a new kind of diffution model to compress those Tensors transport among GPUs in accelerating multi-GPU inferencing.
Kevin-shihello-world's Repositories
Kevin-shihello-world/Up-DownFormer
Up-DownFormer: This kind of transformer architecture is mostly a newly decided GNN decided in this work, And I've tested this kind of gene and on normal GNN test and get superior result and thishe whole new transformer architecture on a NLP task and it got comparable result as the formal all self-attention ones with much lower computation
Kevin-shihello-world/-StartTransformers
🌱StartTransformer_1 is a new transformer structure build with time-wise normalization and a new way to allocate params for FFN in order to train a transformer-kind structure with much lower params stably and its basic idea can be used on developing a lot of another stuctructures
Kevin-shihello-world/ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Kevin-shihello-world/Cut-Shortcut
Cut-shortcut: I consided GNN and other model may use wrong shortcut instead of really learned a good representation, so I add some random-projected target information to a projected matrix in my model and minimize the similarity between the output and the model without target information.My experiment shows it works and it is possible to adopt this
Kevin-shihello-world/DeepSpeed-Compress-comm
DeepSpeed-Compress-comm using inverse FFT and a new kind of diffusion training to compress Tensors in all_reduce in muti-gpu inference to accelerate the speed.
Kevin-shihello-world/GFNet-Pytorch
A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 20% on an iPhone XS Max without sacrificing accuracy.
Kevin-shihello-world/Kevin-shihello-world
Config files for my GitHub profile.
Kevin-shihello-world/StartBert
Kevin-shihello-world/StartTransformer_0
🌱StartTransformer is a new transformer structure build with time-wise normalization and a new way to allocate params for FFN in order to train a transformer-kind structure with much lower params stably and its basic idea can be used on developing a lot of another stuctructures
Kevin-shihello-world/vllm-compress-comm
vllm-compress-comm use inverse FFT and a new kind of training strategy on training of a new kind of diffution model to compress those Tensors transport among GPUs in accelerating multi-GPU inferencing.