PaddleOCR

将任何 PDF 或图像文档转化为结构化数据，供您的 AI 使用。这款强大而轻量级的 OCR 工具包，架起了图像/PDF 与大型语言模型之间的桥梁。支持 100 多种语言。『Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.』

Official Site

Github repo

Overview

Main metrics

Overview

Name With Owner	PaddlePaddle/PaddleOCR
Primary Language	Python
Program language	Python (Language Count: 13)
Platform
License:	Apache License 2.0
Release Count	26
Last Release Name	v3.3.1 (Posted on )
First Release Name	v1.1.0 (Posted on )
Created At	2020-05-08 18:38:16
Pushed At	2025-11-03 21:50:18
Last Commit At
Stargazers Count	62457
Watchers Count	495
Fork Count	9220
Commits Count	6753
Has Issues Enabled
Issues Count	10172
Issue Open Count	199
Pull Requests Count	3012
Pull Requests Open Count	35
Pull Requests Close Count	698
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private