kenshinn (Zhenglin Cheng)

replied to dbv's post over 1 year ago

This comment has been hidden

replied to dbv's post over 1 year ago

This comment has been hidden

replied to dbv's post over 1 year ago

可能是误匹配了

replied to dbv's post over 1 year ago

This comment has been hidden

reacted to their post with ❤️ over 1 year ago

Post

2108

Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a

reacted to their post with 🚀 almost 2 years ago

Post

2108

Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a

posted an update almost 2 years ago

Post

2108

Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a

Zhenglin Cheng PRO

AI & ML interests

Recent Activity

Organizations

Zhenglin Cheng PRO

AI & ML interests

Recent Activity

Organizations

kenshinn's activity