SmolDocling-256M-preview is a multimodal Image-Text-to-Text model for efficient document conversion, supporting the recognition and conversion of various document elements. | AINavgine| ai tool website navigation, ai latest products

SmolDocling-256M-preview

SmolDocling-256M-preview is a multimodal image-to-text model designed for efficient document conversion. It retains the main features of Docling and is fully compatible with Docling, achieved through seamless support for DoclingDocuments. Key features include:

DocTags: Uses DocTags labels, an efficient and minimal document representation method, fully compatible with DoclingDocuments, clearly separating text and document structure.
OCR: Accurately extracts text from images.
Layout and Positioning: Preserves document structure and element bounding boxes.
Code Recognition: Detects and formats code blocks, including indentation.
Formula Recognition: Identifies and processes mathematical expressions.
Chart Recognition: Extracts and interprets chart data.
Table Recognition: Supports column and row headers for structured table extraction.
Image Classification: Distinguishes graphic elements.
Title Correspondence: Links titles to related images and graphics.
List Grouping: Correctly organizes and structures list elements.
Full Page Conversion: Processes entire pages, including all page elements (code, formulas, tables, charts, etc.).
OCR with Bounding Boxes: Uses bounding boxes for OCR region identification.
General Document Processing: Trained on scientific and non-scientific documents.
Seamless Docling Integration: Imports Docling and exports in multiple formats (MD, HTML, etc.).
Fast Inference: Uses VLLM, averaging 0.35 seconds per page on an A100 GPU.

This model is fine-tuned based on Idefics3, using DocTags for efficient tokenization, and will provide enhanced chart recognition, multi-page inference support, and chemical recognition functions. The developers also provide code examples for inference using transformers or vllm, and converting results into multiple output formats using Docling.

SmolDocling-256M-preview

Introduction:

SmolDocling-256M-preview