SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems Paper • 2401.03945 • Published Jan 8, 2024
SpeechAlign: Aligning Speech Generation to Human Preferences Paper • 2404.05600 • Published Apr 8, 2024 • 1
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model Paper • 2408.02503 • Published Aug 5, 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models Paper • 2411.09691 • Published Nov 14, 2024
QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models Paper • 2405.13014 • Published May 14, 2024
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities Paper • 2305.11000 • Published May 18, 2023 • 4
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models Paper • 2308.16692 • Published Aug 31, 2023 • 1
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators Paper • 2402.06894 • Published Feb 10, 2024
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance Paper • 2401.11206 • Published Jan 20, 2024 • 1
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation Paper • 2401.13527 • Published Jan 24, 2024
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems Paper • 2401.03945 • Published Jan 8, 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19, 2024 • 43