Jina AI introduced JINA-VLM, a 2.4-billion parameter vision-language model, which sets a new benchmark for multilingual visual question answering among open models of similar size. The model also demonstrates robust performance on general English VQA tasks and incorporates an attention-pooling connector that reduces visual tokens by 4x, enhancing efficiency.
There are no more papers matching your filters at the moment.