PaddlePaddle Team
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

PaddleOCR-VL introduces a 0.9B ultra-compact vision-language model (VLM) that achieves state-of-the-art multilingual document parsing by decoupling layout analysis from element-level recognition. The model supports 109 languages and secured an overall score of 92.86 on OmniDocBench v1.5, while also delivering 53.1% higher page throughput than leading baselines.

View blog
Resources
There are no more papers matching your filters at the moment.