Researchers have developed DECO, a sparse Mixture-of-Experts (MoE) architecture that delivers performance comparable to dense Transformers while addressing the challenges of storage and memory access for end-side deployment. DECO employs a unique ReLU-based routing mechanism combined with learnable expert-wise scaling to optimize the contributions of experts. Additionally, the introduction of NormSiLU, a novel activation function, enhances the stability of expert activation ratios and promotes higher intrinsic sparsity. Experimental results reveal that DECO, which activates only 20% of its experts, not only matches but exceeds the performance of traditional MoE models, demonstrating a 3.00× speedup in real hardware over dense inference. The team plans to release the code and model checkpoints for public use.
Introducing DECO: A Sparse Mixture-of-Experts Model Achieving Dense Performance for Edge Devices
More Articles From This Day
HarmoWAM Introduces Adaptive World Action Models for Enhanced Robotic Manipulation
HarmoWAM presents a novel approach to robot control through the integration of World Action Models (WAMs), addressing the fundamental trade-off between two existing paradigms: 'Imagine-then-Execute' and 'Joint Modeling'. The research demonstrates that while the former excels in generalizability, it lacks precision, and the latter provides fine-grained actions but is limited by training distribution exploration. HarmoWAM unifies these approaches by employing a world model to enhance both predictive and reactive control, utilizing a Process-Adaptive Gating Mechanism for effective coordination. Evaluations across six real-world robotic tasks reveal that HarmoWAM achieves significant zero-shot generalization, outperforming previous models by 33% and 29%, respectively, in diverse testing environments.
