Ask or search anything...

Jina AI's `jina-embeddings-v4` is a 3.8 billion-parameter model designed for universal multimodal and multilingual retrieval, featuring a unified VLM backbone that minimizes the modality gap between text and images. The model achieves state-of-the-art results on the Jina-VDR and ViDoRe benchmarks for visually rich document retrieval and shows competitive performance across diverse text and multimodal tasks.

#computer-science #artificial-intelligence #computation-and-language

jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking

1,337

06 Oct 2025

University of Pittsburgh Jina AI GmbH

Jina AI's jina-reranker-v3 model implements a "last but not late interaction" (LBNL) mechanism, enabling a Qwen3-0.6B transformer to concurrently process queries and multiple documents for listwise reranking. This method achieved a 61.85 nDCG@10 on the BEIR benchmark, exceeding the performance of significantly larger rerankers, and showed strong capabilities across diverse languages and specialized domains.

#computer-science #artificial-intelligence #computation-and-language

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

303

07 Jul 2025

The research from Jina AI and Weaviate introduces "Late Chunking," a method that generates contextual chunk embeddings by first encoding an entire document with a long-context embedding model and then applying chunking and pooling. This approach consistently improves retrieval accuracy (nDCG@10 by 2.70-3.63% on BeIR benchmarks) for neural search and RAG systems while avoiding the computational costs of LLM-based contextualization.

#computer-science #computation-and-language #information-retrieval

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

1,723

19 Sep 2024

Jina AI introduces jina-embeddings-v3, a 570-million-parameter multilingual text embedding model that leverages Task LoRA for efficient adaptation to diverse tasks. It achieves superior performance on MTEB English and multilingual benchmarks, especially in long-context retrieval up to 8192 tokens, surpassing larger LLM-based embeddings.

#computer-science #artificial-intelligence #computation-and-language

ReaderLM-v2: Small Language Model for HTML to Markdown and JSON

1,073

03 Mar 2025

史泽生

Jina AI researchers introduce ReaderLM-v2, a compact 1.5B parameter language model that outperforms GPT-4 by 15-20% in HTML-to-Markdown/JSON conversion tasks while handling documents up to 512K tokens, demonstrating that specialized smaller models can excel at structured content extraction through innovative three-stage training.

#computer-science #artificial-intelligence #computation-and-language

jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images

727

24 Apr 2025

Contrastive Language-Image Pretraining (CLIP) has been widely used for crossmodal information retrieval and multimodal understanding tasks. However, CLIP models are mainly optimized for crossmodal vision-language tasks and underperform in single-mode text tasks. Moreover, these models are often trained on English datasets and therefore lack multilingual understanding. Additionally, from a visual understanding perspective, previous CLIP-based models exhibit insufficient understanding of visually rich documents. In this work, we propose jina-clip-v2, a contrastive vision-language model trained on text pairs, triplets and image-text pairs via a multi-task and multi-stage contrastive learning paradigm in order to support both text-only and crossmodal tasks. We employ a multilingual text encoder and expand the training dataset to include multilingual texts from 29 non-English languages, including Hindi, Chinese, German, French, and others, as well as images of visually rich documents. We evaluate the model's performance and show that jina-clip-v2 achieves notable improvements over state-of-the-art CLIP-based models in zero-shot text-only retrieval, semantic textual similarity, and crossmodal retrieval tasks in both English and multilingual settings. jina-clip-v2 also provides for flexibility in embedding dimensionality, enabling users to select the granularity of the representations. jina-clip-v2 is publicly available at this https URL

#computer-science #contrastive-learning #computation-and-language

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

457

26 Jun 2024

Contrastive Language-Image Pretraining (CLIP) is widely used to train models to align images and texts in a common embedding space by mapping them to fixed-sized vectors. These models are key to multimodal information retrieval and related tasks. However, CLIP models generally underperform in text-only tasks compared to specialized text models. This creates inefficiencies for information retrieval systems that keep separate embeddings and models for text-only and multimodal tasks. We propose a novel, multi-task contrastive training method to address this issue, which we use to train the jina-clip-v1 model to achieve the state-of-the-art performance on both text-image and text-text retrieval tasks.

#computer-science #contrastive-learning #artificial-intelligence

Resources 779

392

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

04 Feb 2024

Jina AI introduced Jina Embeddings v2, the first open-source general-purpose text embedding model designed to process documents up to 8192 tokens, a 16x increase over typical limits. The model achieves competitive performance on standard benchmarks and demonstrates superior results on long-document specific tasks, outperforming proprietary models like OpenAI's `ada-embeddings-002` on the LoCo benchmark.

#computer-science #artificial-intelligence #computation-and-language

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

301

14 Sep 2024

The University of Texas at Austin Jina AI GmbH

Multi-vector dense models, such as ColBERT, have proven highly effective in information retrieval. ColBERT's late interaction scoring approximates the joint query-document attention seen in cross-encoders while maintaining inference efficiency closer to traditional dense retrieval models, thanks to its bi-encoder architecture and recent optimizations in indexing and search. In this work we propose a number of incremental improvements to the ColBERT model architecture and training pipeline, using methods shown to work in the more mature single-vector embedding model training paradigm, particularly those that apply to heterogeneous multilingual data or boost efficiency with little tradeoff. Our new model, Jina-ColBERT-v2, demonstrates strong performance across a range of English and multilingual retrieval tasks.

#computer-science #artificial-intelligence #computation-and-language

ReaderLM-v2: Small Language Model for HTML to Markdown and JSON

223

03 Mar 2025

We present ReaderLM-v2, a compact 1.5 billion parameter language model designed for efficient web content extraction. Our model processes documents up to 512K tokens, transforming messy HTML into clean Markdown or JSON formats with high accuracy -- making it an ideal tool for grounding large language models. The model's effectiveness results from two key innovations: (1) a three-stage data synthesis pipeline that generates high quality, diverse training data by iteratively drafting, refining, and critiquing web content extraction; and (2) a unified training framework combining continuous pre-training with multi-objective optimization. Intensive evaluation demonstrates that ReaderLM-v2 outperforms GPT-4o-2024-08-06 and other larger models by 15-20\% on carefully curated benchmarks, particularly excelling at documents exceeding 100K tokens, while maintaining significantly lower computational requirements.

#computer-science #artificial-intelligence #computation-and-language