2023-03-29 11:16:58
LLaVA-Mini👏is an efficient LMM for image/video understanding using one vision token, offering: (1)⏩lower latency (fast as 40ms per image), (2)🖥️less VRAM usage (support 3-hour video understanding on 24GB GPU). Paper: Code & Demo: