Updates und Roadmap

Schauen Sie sich unsere neuesten Funktionen, Verbesserungen und ZukunftsplΓ€ne an

Als nΓ€chstes kommt

Wir arbeiten an leistungsstarken neuen Funktionen, um die PDF-Konvertierung noch besser zu machen

Voraussichtliche VerΓΆffentlichung: Zwischen dem 28. Dezember und dem 3. Januar
πŸ“±

Desktop- und iOS-Anwendungen

EinfΓΌhrung dedizierter Desktop- und iOS-Anwendungen, um mehr lokale Funktionen bereitzustellen und das mobile Benutzererlebnis zu verbessern

πŸ“š

UnterstΓΌtzung fΓΌr weitere eBook-Formate

Unterstützung für mehrere neue eBook-Formate, einschließlich AZW3, MOBI und mehr, um unterschiedliche Leser- und GerÀteanforderungen zu erfüllen

βˆ‘

Fortsetzung der Optimierung der Formelanzeige

Weitere Verbesserung der Darstellung mathematischer Formeln und chemischer Gleichungen sowie Verbesserung der Erkennungsgenauigkeit fΓΌr komplexe Formeln

Update-Verlauf

European Locale Expansion (fr-FR / de-DE / it-IT / es-ES)

Release Date: 2026-03-04

New Features

  • Added four new locales to the i18n resource set:
    • fr-FR
    • de-DE
    • it-IT
    • es-ES
  • Language list is now data-driven from src/i18n/common/*.json locale files.
  • Added base-language to regional-language resolution:
    • fr / fr-CA -> fr-FR
    • de / de-AT -> de-DE
    • it -> it-IT
    • es / es-MX -> es-ES

Improvements

  • Added i18n glossary for term consistency:
    • docs/i18n-glossary.md
  • Added a reusable language acceptance checklist:
    • docs/i18n-acceptance-checklist.md

Notes / Caveats

  • English (en) remains the fallback language.
  • Third-party locale packs currently support zh-CN and en; when locale-specific packs are missing, components will fall back to English.

This release adds support for GitHub-Flavored Markdown (GFM) table rendering, enhances error handling for interruption scenarios, and includes important bug fixes and dependency updates.

What's Changed

Features

  • GFM Table Support: Added intelligent conversion of HTML tables to GitHub-Flavored Markdown format in https://github.com/oomol-lab/pdf-craft/pull/345

    • Simple tables are automatically converted to clean GFM pipe table syntax
    • Complex tables (with colspan, rowspan, or multiple tbody sections) gracefully fall back to HTML format to preserve structure
    • Prevents data loss from unsupported table features in GFM format
    • Added comprehensive test coverage for various table scenarios
    • New dependency: markdownify library for table conversion
  • Enhanced InterruptedError API: Added public properties to InterruptedError for better error introspection in https://github.com/oomol-lab/pdf-craft/pull/346

    • New kind property exposes the interruption type (abort or token limit exceeded)
    • New metering property provides direct access to OCR token usage data
    • OCRTokensMetering is now exported from the public API for convenience
    • Enables users to programmatically handle different interruption scenarios and track resource consumption

Bug Fixes

  • Fixed Error Propagation: Corrected handling of critical error types during page extraction in https://github.com/oomol-lab/pdf-craft/pull/343
    • AbortError and TokenLimitError now propagate correctly instead of being wrapped in OCRError
    • Ensures interruption signals are properly received and handled by calling code
    • Prevents masking of user-initiated abort operations and token limit violations

Dependencies

Migration Notes

InterruptedError Changes

If you're catching InterruptedError exceptions, you can now access detailed information about the interruption:

from pdf_craft import transform_markdown, InterruptedError, InterruptedKind

try:
    transform_markdown(
        pdf_path="input.pdf",
        markdown_path="output.md",
    )
except InterruptedError as error:
    # New in v1.0.11: Access interruption details
    if error.kind == InterruptedKind.ABORT:
        print("User aborted the operation")
    elif error.kind == InterruptedKind.TOKEN_LIMIT_EXCEEDED:
        print(f"Token limit exceeded: {error.metering.input_tokens} input tokens used")

    # Access token usage statistics
    print(f"Total tokens: {error.metering.input_tokens + error.metering.output_tokens}")

Table Rendering

Tables in your PDF documents will now be converted to GFM format when possible, making them more readable in markdown viewers. Complex tables will automatically fall back to HTML to preserve their structure.

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.10...v1.0.11

This release simplifies the table of contents (TOC) extraction API by replacing enum-based modes with a boolean flag, while adding LLM-powered chapter title analysis capabilities for improved TOC hierarchy detection.

What's Changed

Breaking Changes

  • Simplified TOC API: Replaced TocExtractionMode enum with a simpler toc_assumed boolean parameter in https://github.com/oomol-lab/pdf-craft/pull/341
    • Removed toc_mode parameter from transform_markdown() and transform_epub() functions
    • Removed TocExtractionMode from public API exports
    • Introduced toc_assumed boolean flag to control TOC detection behavior

Features

  • LLM-Powered Chapter Title Analysis: Added support for LLM-based analysis of chapter titles to enhance TOC extraction accuracy in https://github.com/oomol-lab/pdf-craft/pull/341
    • Automatically analyzes chapter title hierarchies when toc_llm is configured
    • Provides more accurate chapter level detection for complex book structures
    • Intelligently falls back to standard analysis when LLM is unavailable or encounters errors

Improvements

  • Enhanced Error Handling: Added robust error handling for LLM-based analysis with automatic recovery mechanisms in https://github.com/oomol-lab/pdf-craft/pull/341
    • Better error diagnostics for LLM analysis failures
    • Graceful degradation when LLM analysis fails, ensuring conversion continues successfully

Migration Guide

If you were using toc_mode in previous versions, update your code as follows:

Previous API (v1.0.9 and earlier)

from pdf_craft import transform_markdown, TocExtractionMode

# For Markdown conversion
transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    toc_mode=TocExtractionMode.NO_TOC_PAGE,  # Old parameter
)

# For EPUB conversion
transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    toc_mode=TocExtractionMode.AUTO_DETECT,  # Old parameter
)

New API (v1.0.10)

from pdf_craft import transform_markdown

# For Markdown conversion (assumes no TOC pages by default)
transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    toc_assumed=False,  # New boolean parameter (default: False)
)

# For EPUB conversion (assumes TOC pages exist)
transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    toc_assumed=True,  # New boolean parameter
)

Migration Mapping

Old toc_mode ValueNew toc_assumed Value
TocExtractionMode.NO_TOC_PAGEFalse
TocExtractionMode.AUTO_DETECTTrue
TocExtractionMode.LLM_ENHANCEDTrue (with toc_llm configured)

LLM-Enhanced TOC Extraction

To use LLM-powered chapter title analysis:

from pdf_craft import transform_epub, BookMeta, LLM

# Configure LLM for TOC enhancement
toc_llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="cl100k_base",
)

transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    toc_assumed=True,  # Enable TOC detection
    toc_llm=toc_llm,   # Enable LLM-powered analysis
    book_meta=BookMeta(
        title="Book Title",
        authors=["Author"],
    ),
)

Notes

  • The toc_assumed parameter defaults to False for Markdown conversion and True for EPUB conversion (maintaining backward-compatible behavior)
  • LLM-powered chapter title analysis is optional and automatically falls back to standard analysis if not configured or if errors occur
  • The new API is simpler and more intuitive, reducing the cognitive load of choosing between multiple enum values

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.9...v1.0.10

This release introduces enhanced table of contents (TOC) extraction capabilities using LLM-powered analysis, enabling more accurate chapter structure detection and hierarchy recognition.

What's Changed

Features

Refactoring

Background

Previously, pdf-craft used statistical analysis to detect TOC pages and extract chapter structure. While effective for basic cases, this approach had limitations in accurately determining chapter hierarchies and handling complex TOC layouts. This release introduces LLM-powered analysis to better understand TOC structure and extract hierarchical information.

How It Works

The new TOC extraction process:

  1. Identify TOC Pages: Uses statistical analysis to detect which pages contain table of contents
  2. Collect All TOC Pages: Gathers all identified TOC pages for comprehensive analysis
  3. LLM Analysis: Passes all TOC pages to an LLM to extract chapter titles and their hierarchical levels
  4. Structure Generation: Uses the extracted hierarchy information to build accurate EPUB navigation structure

This approach combines the efficiency of statistical detection with the semantic understanding capabilities of LLMs, resulting in more accurate chapter organization in the final output.

Usage

The TOC extraction improvements are automatically applied when using the appropriate toc_mode:

from pdf_craft import transform_epub, BookMeta, TocExtractionMode

# Use AUTO_DETECT for statistical analysis (default for EPUB)
transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    toc_mode=TocExtractionMode.AUTO_DETECT,
    book_meta=BookMeta(
        title="Book Title",
        authors=["Author"],
    ),
)

# Use LLM_ENHANCED for LLM-powered extraction (requires toc_llm configuration)
from pdf_craft import LLM

toc_llm = LLM(
    key="your-api-key",
    url="https://api.openai.com/v1",
    model="gpt-4",
    token_encoding="cl100k_base",
)

transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    toc_mode=TocExtractionMode.LLM_ENHANCED,
    toc_llm=toc_llm,
    book_meta=BookMeta(
        title="Book Title",
        authors=["Author"],
    ),
)

Notes

  • Important: When using TocExtractionMode.LLM_ENHANCED, the toc_llm parameter must be configured. The conversion will fail if toc_llm is not provided.
  • This feature is most beneficial for books with complex chapter hierarchies
  • The statistical TOC page detection remains as the first step, with LLM analysis enhancing the extraction quality

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.8...v1.0.9

This release brings enhanced error handling flexibility, improved OCR text quality, and important security fixes.

What's Changed

Features

Security

Other

Example Usage

Custom Error Handling with Functions

from pdf_craft import transform_markdown, OCRError

def should_ignore_ocr_error(error: OCRError) -> bool:
    # Only ignore specific types of OCR errors
    return error.kind == "recognition_failed"

transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    ignore_ocr_errors=should_ignore_ocr_error,  # Pass custom function
)

Traditional Boolean Error Handling (Still Supported)

from pdf_craft import transform_markdown

transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    ignore_ocr_errors=True,  # Simple boolean flag
)

API Changes

The following parameters have been enhanced to accept both boolean values and callable functions:

  • ignore_pdf_errors: bool | Callable[[PDFError], bool]
  • ignore_ocr_errors: bool | Callable[[OCRError], bool]

This change is fully backward compatible - existing code using boolean values will continue to work without modifications.

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.7...v1.0.8

This release adds support for including cover images in both Markdown and EPUB conversions, enhancing the output format options.

What's Changed

Features

  • Cover Image Support: Added includes_cover parameter to both transform_markdown and transform_epub functions, allowing you to include the PDF's cover page as an image in the output in https://github.com/oomol-lab/pdf-craft/pull/319
    • For Markdown conversion: The cover image is saved to the images folder and can be referenced in your document
    • For EPUB conversion: The cover image is properly embedded in the EPUB file structure
    • Default value is False for Markdown (to maintain backward compatibility) and True for EPUB

Example Usage

Markdown with Cover

from pdf_craft import transform_markdown

transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    markdown_assets_path="images",
    includes_cover=True,  # Include cover image
)

EPUB with Cover

from pdf_craft import transform_epub, BookMeta

transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    includes_cover=True,  # Include cover image (default)
    book_meta=BookMeta(
        title="Book Title",
        authors=["Author"],
    ),
)

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.6...v1.0.7

This release brings significant improvements to PDF rendering control, text quality, and error handling capabilities.

What's Changed

Features

  • Flexible DPI Control: Added dpi parameter to control PDF page rendering resolution (default: 300 DPI), allowing you to balance between image quality and file size in https://github.com/oomol-lab/pdf-craft/pull/315

  • Automatic Image Size Optimization: Introduced max_page_image_file_size parameter that automatically adjusts DPI when generated images exceed specified size limits, preventing overly large output files in https://github.com/oomol-lab/pdf-craft/pull/315

  • Resilient OCR Processing: Added ignore_ocr_errors parameter to continue processing when OCR recognition fails on individual pages, instead of stopping the entire conversion in https://github.com/oomol-lab/pdf-craft/pull/314

  • Improved Text Quality: Automatically removes Unicode surrogate characters from OCR-extracted text and PDF metadata (title, authors, publisher, etc.), ensuring cleaner output and better compatibility with downstream tools in https://github.com/oomol-lab/pdf-craft/pull/316

Documentation

Dependencies

  • Updated epub-generator to 0.1.6

Example Usage

from pdf_craft import transform_markdown

transform_markdown(
    pdf_path="input.pdf",
    markdown_path="output.md",
    dpi=300,  # Control rendering resolution
    max_page_image_file_size=5242880,  # 5MB limit per page
    ignore_ocr_errors=True,  # Continue on OCR failures
)

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.5...v1.0.6

Release v1.0.5

What's Changed

Bug Fixes

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.4...v1.0.5

Release v1.0.4

What's New

🎯 Table of Contents Detection and Smart Removal

pdf-craft now automatically detects and removes table of contents pages from the final output, preventing duplicate TOC content in generated EPUB files. The system uses statistical analysis to identify TOC pages by matching chapter titles against page content, then intelligently excludes these pages while preserving the navigation structure.

Related: https://github.com/oomol-lab/pdf-craft/issues/268

Key features:

  • Automatic TOC page detection using Aho-Corasick substring matching
  • Hierarchical TOC level analysis for improved chapter organization
  • XML-based TOC storage for better performance and flexibility
  • New toc_assumed parameter to control TOC detection behavior (default: True for EPUB, False for Markdown)

Implementation PRs:

πŸ“ Raw HTML Tag Support in Markdown

Full support for CommonMark-compliant raw HTML tags in Markdown output. DeepSeek OCR often generates HTML tags (like <sup> for superscripts) when processing scanned books - these are now properly preserved and rendered in both Markdown and EPUB formats.

Related: https://github.com/oomol-lab/pdf-craft/issues/283

Supported tags include:

  • Inline tags: <sup>, <sub>, <mark>, <u>, <kbd>
  • Block-level tags: <div>, <center>, <details>, <summary>
  • Automatic safety filtering and attribute validation

Implementation PRs:

πŸ“Š Enhanced Table Rendering

Tables are now rendered in native HTML format for both Markdown and EPUB outputs, providing better structure and readability. Asset metadata now supports structured titles and captions for equations, images, and tables.

https://github.com/oomol-lab/pdf-craft/pull/306

πŸ“– PDF Metadata Extraction

Automatically extracts book metadata (title, authors, publisher, ISBN, etc.) from PDF files and uses it to populate EPUB metadata. No need to manually specify book information when the PDF already contains it.

https://github.com/oomol-lab/pdf-craft/pull/284

πŸ“° Multi-Column Layout Detection

Improved handling of multi-column layouts (common in academic papers and magazines) through histogram valley detection and coefficient-of-variation splitting. Layouts are now correctly grouped by column segments before processing.

https://github.com/oomol-lab/pdf-craft/pull/286

πŸ› Bug Fixes

πŸ”§ Improvements

πŸ“š Documentation

πŸ”„ API Changes

New Parameters

  • toc_assumed parameter in transform_markdown() and transform_epub():
    • When True: Attempts to locate and extract TOC from PDF to build document structure
    • When False: Generates TOC based on document headings only
    • Default: True for EPUB, False for Markdown

New Exports

  • PDFDocumentMetadata: Dataclass for PDF metadata extraction

πŸ™ Contributors

Thanks to everyone who contributed to this release!

πŸ“¦ Installation

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install pdf-craft==1.0.4

For detailed installation instructions, see the Installation Guide.


Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.3...v1.0.4

Release v1.0.3

What's Changed

License Improvements

  • Removed PyMuPDF (fitz) Dependency: Replaced PyMuPDF (AGPL-3.0) with Poppler for PDF parsing and rendering, maintaining pdf-craft's MIT license compatibility
    • pdf-craft now uses Poppler via pdf2image (MIT) for all PDF operations
    • This change ensures the entire project remains under the permissive MIT license

New Features

  • Custom PDF Handler Support: Added pdf_handler parameter to predownload_models(), transform_markdown(), and transform_epub() functions, allowing users to customize PDF rendering implementation
  • Poppler Integration: Migrated to Poppler (via pdf2image) for PDF parsing and rendering, providing better compatibility and control
  • New Public APIs: Exported PDFHandler, PDFDocument, DefaultPDFHandler, and DefaultPDFDocument for advanced customization
  • RENDERED Event: Added OCREventKind.RENDERED event to track PDF page rendering progress

Breaking Changes

⚠️ Parameter Renamed: ignore_fitz_errors β†’ ignore_pdf_errors

  • Update your code: transform_markdown(..., ignore_pdf_errors=True) instead of ignore_fitz_errors=True
  • Update your code: transform_epub(..., ignore_pdf_errors=True) instead of ignore_fitz_errors=True

⚠️ Exception Renamed: FitzError β†’ PDFError

  • Update your exception handling code accordingly

Dependencies

Bug Fixes

  • Upgraded doc-page-extractor to fix bugs (#280)

Migration Guide

If you're upgrading from v1.0.2, please:

  1. Install Poppler following the Installation Guide
  2. Update parameter names in your code:
    # Before (v1.0.2)
    transform_markdown(..., ignore_fitz_errors=True)
    
    # After (v1.0.3)
    transform_markdown(..., ignore_pdf_errors=True)
    
  3. Update exception handling if you catch FitzError:
    # Before (v1.0.2)
    from pdf_craft import FitzError
    
    # After (v1.0.3)
    from pdf_craft import PDFError
    

Full Changelog

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.2...v1.0.3

This release brings improvements to EPUB generation, inline LaTeX support, and enhanced handling of footnotes and tables.

What's Changed

New Features

Improvements

Bug Fixes

Breaking Changes

⚠️ API Parameter Changes: The model parameter has been renamed to ocr_size in transform_markdown() and transform_epub() functions. Additionally, the type DeepSeekOCRModel has been renamed to DeepSeekOCRSize.

Migration:

# Old (v1.0.1)
transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    model="gundam"
)

# New (v1.0.2)
transform_epub(
    pdf_path="input.pdf",
    epub_path="output.epub",
    ocr_size="gundam"
)

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.1...v1.0.2

What's New in v1.0.1

  • Enhanced Error Handling: Added structured error types (FitzError, OCRError, InterruptedError) with detailed page and step information for better debugging
  • Improved Stability: Fixed crashes when encountering single-page PyMuPDF errors - now handles page-level failures gracefully
  • Online Demo: Try PDF Craft directly in your browser at PDF Craft without any installation

What's Changed

Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.0...v1.0.1

πŸŽ‰ PDF Craft v1.0.0 Official Release

PDF Craft v1.0.0 is now officially released. This version includes major architectural changes and brings significant performance improvements.

πŸš€ Core Changes: Fully Embracing DeepSeek OCR

The biggest change in v1.0.0 is the complete rewrite based on DeepSeek OCR, eliminating the dependency on LLM for text correction.

DeepSeek OCR is a powerful open-source OCR engine that supports complex content recognition (tables, formulas, images, footnotes, etc.) with excellent document structure understanding capabilities. Thanks to DeepSeek OCR, pdf-craft now offers:

  • Fully Local Processing: The entire conversion process runs completely locally without any network requests. No need to configure LLM APIs, and no risk of conversion failures due to network issues or API outagesβ€”in the old version, a single LLM request failure would halt the entire conversion process.
  • Faster Speed: Compared to v0.2.8 which required multiple LLM calls for text correction, the new version uses direct OCR recognition with significantly improved speed.
  • Higher Accuracy: DeepSeek OCR excels at document structure analysis, table recognition, and formula extraction, delivering high-quality results without secondary correction.
  • Simpler API: Removed complex LLM configuration and multi-step processing workflows. Now conversion can be completed with a single function call.

Additionally, v1.0.0 has fully migrated to DeepSeek OCR (MIT License), removing the previous AGPL-3.0 dependency. The entire project now uses the more permissive MIT License, making it easier for commercial use and integration!

⚠️ Important Change: CUDA Environment Required

The new version requires a CUDA environment to run. This is because DeepSeek OCR depends on CUDA acceleration for efficient document recognition. The old version (v0.2.8) could work in pure CPU environments using LLM, but the new version cannot run without a GPU.

If your environment doesn't support CUDA, do not upgrade to v1.0.0. Continue using v0.2.8:

pip install pdf-craft==0.2.8

For specific CUDA environment installation instructions, please refer to the Installation Guide.

🚫 When NOT to Upgrade

Continue using v0.2.8 in the following situations:

  1. No GPU or CUDA Environment: The new version requires CUDA and cannot run without GPU
  2. Need LLM Text Correction: The new version has removed LLM correction functionality. If your use case requires secondary correction of OCR results, continue using the old version or use it in combination with epub-translator

πŸ™ Acknowledgments

Thanks to DeepSeek OCR for being open source, and to all community members who have contributed code and feedback to pdf-craft!


If you have a CUDA environment, upgrade to v1.0.0 now and experience faster, more stable, and simpler PDF conversion! πŸš€