AntAngelMed is a medical-domain language model with 103 billion total parameters, but it does not activate all of those parameters during inference. Instead, it uses a Mixture-of-Experts (MoE) ...
Modern large language models are no longer trained only on raw internet text. Increasingly, companies are using powerful “teacher” models to help train smaller or more efficient “student” models. This ...