Introduction APUS-xDAN-4.0-MOE is a transformer-based decoder-only language model, developed on a vast corpus of data to ensure robust performance.

This is an enhanced MoE (Mixture of Experts) model built on top of the continued pre-training enhanced LlaMA architecture, further optimized with human-enhanced feedback algorithms to improve reasoning, mathematical, and logical capabilities during inference.

For more comprehensive information, please visit our blog post and GitHub repository. https://github.com/shootime2021/APUS-xDAN-4.0-moe

Model Details APUS-xDAN-4.0-MOE leverages the innovative Mixture of Experts (MoE) architecture, incorporating components from dense language models. Specifically, it inherits its capabilities from the highly performant xDAN-L2 Series. With a total of 136 billion parameters, of which 30 billion are activated during runtime, APUS-xDAN-4.0-MOE demonstrates unparalleled efficiency. Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090. The following specifications:

Parameters: 136B Architecture: Mixture of 4 Experts (MoE) Experts Utilization: 2 experts used per token Layers: 60 Attention Heads: 56 for queries, 8 for keys/values Embedding Size: 7,168 Additional Features: Rotary embeddings (RoPE) Supports activation sharding and 1.5bit~4bit quantization Maximum Sequence Length (context): 32,768 tokens

Downloads last month
13
Safetensors
Model size
114B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for xDAN2099/APUS-xDAN-4.0-MoE-v2

Quantizations
2 models