Updates und Roadmap
Als nΓ€chstes kommt
Wir arbeiten an leistungsstarken neuen Funktionen, um die PDF-Konvertierung noch besser zu machen
Desktop- und iOS-Anwendungen
EinfΓΌhrung dedizierter Desktop- und iOS-Anwendungen, um mehr lokale Funktionen bereitzustellen und das mobile Benutzererlebnis zu verbessern
UnterstΓΌtzung fΓΌr weitere eBook-Formate
UnterstΓΌtzung fΓΌr mehrere neue eBook-Formate, einschlieΓlich AZW3, MOBI und mehr, um unterschiedliche Leser- und GerΓ€teanforderungen zu erfΓΌllen
Fortsetzung der Optimierung der Formelanzeige
Weitere Verbesserung der Darstellung mathematischer Formeln und chemischer Gleichungen sowie Verbesserung der Erkennungsgenauigkeit fΓΌr komplexe Formeln
Update-Verlauf
European Locale Expansion (fr-FR / de-DE / it-IT / es-ES)
Release Date: 2026-03-04
New Features
- Added four new locales to the i18n resource set:
fr-FRde-DEit-ITes-ES
- Language list is now data-driven from
src/i18n/common/*.jsonlocale files. - Added base-language to regional-language resolution:
fr/fr-CA->fr-FRde/de-AT->de-DEit->it-ITes/es-MX->es-ES
Improvements
- Added i18n glossary for term consistency:
docs/i18n-glossary.md
- Added a reusable language acceptance checklist:
docs/i18n-acceptance-checklist.md
Notes / Caveats
- English (
en) remains the fallback language. - Third-party locale packs currently support
zh-CNanden; when locale-specific packs are missing, components will fall back to English.
This release adds support for GitHub-Flavored Markdown (GFM) table rendering, enhances error handling for interruption scenarios, and includes important bug fixes and dependency updates.
What's Changed
Features
-
GFM Table Support: Added intelligent conversion of HTML tables to GitHub-Flavored Markdown format in https://github.com/oomol-lab/pdf-craft/pull/345
- Simple tables are automatically converted to clean GFM pipe table syntax
- Complex tables (with colspan, rowspan, or multiple tbody sections) gracefully fall back to HTML format to preserve structure
- Prevents data loss from unsupported table features in GFM format
- Added comprehensive test coverage for various table scenarios
- New dependency:
markdownifylibrary for table conversion
-
Enhanced InterruptedError API: Added public properties to
InterruptedErrorfor better error introspection in https://github.com/oomol-lab/pdf-craft/pull/346- New
kindproperty exposes the interruption type (abort or token limit exceeded) - New
meteringproperty provides direct access to OCR token usage data OCRTokensMeteringis now exported from the public API for convenience- Enables users to programmatically handle different interruption scenarios and track resource consumption
- New
Bug Fixes
- Fixed Error Propagation: Corrected handling of critical error types during page extraction in https://github.com/oomol-lab/pdf-craft/pull/343
AbortErrorandTokenLimitErrornow propagate correctly instead of being wrapped inOCRError- Ensures interruption signals are properly received and handled by calling code
- Prevents masking of user-initiated abort operations and token limit violations
Dependencies
- EPUB Generator Update: Upgraded
epub-generatordependency to fix MathML property declaration bug in https://github.com/oomol-lab/pdf-craft/pull/344- Fixes https://github.com/Moskize91/epub-generator/issues/22: OPF files incorrectly declared
mathmlproperty when LaTeX-to-MathML conversion failed, causing EPUBCheck validation failures - EPUB files now pass validation by only declaring MathML properties when actual MathML content exists
- Fixes https://github.com/Moskize91/epub-generator/issues/22: OPF files incorrectly declared
Migration Notes
InterruptedError Changes
If you're catching InterruptedError exceptions, you can now access detailed information about the interruption:
from pdf_craft import transform_markdown, InterruptedError, InterruptedKind
try:
transform_markdown(
pdf_path="input.pdf",
markdown_path="output.md",
)
except InterruptedError as error:
# New in v1.0.11: Access interruption details
if error.kind == InterruptedKind.ABORT:
print("User aborted the operation")
elif error.kind == InterruptedKind.TOKEN_LIMIT_EXCEEDED:
print(f"Token limit exceeded: {error.metering.input_tokens} input tokens used")
# Access token usage statistics
print(f"Total tokens: {error.metering.input_tokens + error.metering.output_tokens}")
Table Rendering
Tables in your PDF documents will now be converted to GFM format when possible, making them more readable in markdown viewers. Complex tables will automatically fall back to HTML to preserve their structure.
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.10...v1.0.11
This release simplifies the table of contents (TOC) extraction API by replacing enum-based modes with a boolean flag, while adding LLM-powered chapter title analysis capabilities for improved TOC hierarchy detection.
What's Changed
Breaking Changes
- Simplified TOC API: Replaced
TocExtractionModeenum with a simplertoc_assumedboolean parameter in https://github.com/oomol-lab/pdf-craft/pull/341- Removed
toc_modeparameter fromtransform_markdown()andtransform_epub()functions - Removed
TocExtractionModefrom public API exports - Introduced
toc_assumedboolean flag to control TOC detection behavior
- Removed
Features
- LLM-Powered Chapter Title Analysis: Added support for LLM-based analysis of chapter titles to enhance TOC extraction accuracy in https://github.com/oomol-lab/pdf-craft/pull/341
- Automatically analyzes chapter title hierarchies when
toc_llmis configured - Provides more accurate chapter level detection for complex book structures
- Intelligently falls back to standard analysis when LLM is unavailable or encounters errors
- Automatically analyzes chapter title hierarchies when
Improvements
- Enhanced Error Handling: Added robust error handling for LLM-based analysis with automatic recovery mechanisms in https://github.com/oomol-lab/pdf-craft/pull/341
- Better error diagnostics for LLM analysis failures
- Graceful degradation when LLM analysis fails, ensuring conversion continues successfully
Migration Guide
If you were using toc_mode in previous versions, update your code as follows:
Previous API (v1.0.9 and earlier)
from pdf_craft import transform_markdown, TocExtractionMode
# For Markdown conversion
transform_markdown(
pdf_path="input.pdf",
markdown_path="output.md",
toc_mode=TocExtractionMode.NO_TOC_PAGE, # Old parameter
)
# For EPUB conversion
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
toc_mode=TocExtractionMode.AUTO_DETECT, # Old parameter
)
New API (v1.0.10)
from pdf_craft import transform_markdown
# For Markdown conversion (assumes no TOC pages by default)
transform_markdown(
pdf_path="input.pdf",
markdown_path="output.md",
toc_assumed=False, # New boolean parameter (default: False)
)
# For EPUB conversion (assumes TOC pages exist)
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
toc_assumed=True, # New boolean parameter
)
Migration Mapping
Old toc_mode Value | New toc_assumed Value |
|---|---|
TocExtractionMode.NO_TOC_PAGE | False |
TocExtractionMode.AUTO_DETECT | True |
TocExtractionMode.LLM_ENHANCED | True (with toc_llm configured) |
LLM-Enhanced TOC Extraction
To use LLM-powered chapter title analysis:
from pdf_craft import transform_epub, BookMeta, LLM
# Configure LLM for TOC enhancement
toc_llm = LLM(
key="your-api-key",
url="https://api.openai.com/v1",
model="gpt-4",
token_encoding="cl100k_base",
)
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
toc_assumed=True, # Enable TOC detection
toc_llm=toc_llm, # Enable LLM-powered analysis
book_meta=BookMeta(
title="Book Title",
authors=["Author"],
),
)
Notes
- The
toc_assumedparameter defaults toFalsefor Markdown conversion andTruefor EPUB conversion (maintaining backward-compatible behavior) - LLM-powered chapter title analysis is optional and automatically falls back to standard analysis if not configured or if errors occur
- The new API is simpler and more intuitive, reducing the cognitive load of choosing between multiple enum values
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.9...v1.0.10
This release introduces enhanced table of contents (TOC) extraction capabilities using LLM-powered analysis, enabling more accurate chapter structure detection and hierarchy recognition.
What's Changed
Features
-
LLM-Powered TOC Level Extraction: Implemented LLM-based analysis to automatically extract and recognize hierarchical levels in table of contents, improving chapter structure accuracy in https://github.com/oomol-lab/pdf-craft/pull/336
-
Enhanced TOC Page Processing: Modified the TOC detection algorithm to pass all identified TOC pages to the LLM for comprehensive analysis, rather than processing them individually in https://github.com/oomol-lab/pdf-craft/pull/338
- Improves the accuracy of chapter hierarchy detection
- Provides better context for LLM analysis by including all TOC pages
Refactoring
- LLM Analyzer Refactoring: Refactored
llm_analyser.pyto improve code maintainability and extensibility in https://github.com/oomol-lab/pdf-craft/pull/339
Background
Previously, pdf-craft used statistical analysis to detect TOC pages and extract chapter structure. While effective for basic cases, this approach had limitations in accurately determining chapter hierarchies and handling complex TOC layouts. This release introduces LLM-powered analysis to better understand TOC structure and extract hierarchical information.
How It Works
The new TOC extraction process:
- Identify TOC Pages: Uses statistical analysis to detect which pages contain table of contents
- Collect All TOC Pages: Gathers all identified TOC pages for comprehensive analysis
- LLM Analysis: Passes all TOC pages to an LLM to extract chapter titles and their hierarchical levels
- Structure Generation: Uses the extracted hierarchy information to build accurate EPUB navigation structure
This approach combines the efficiency of statistical detection with the semantic understanding capabilities of LLMs, resulting in more accurate chapter organization in the final output.
Usage
The TOC extraction improvements are automatically applied when using the appropriate toc_mode:
from pdf_craft import transform_epub, BookMeta, TocExtractionMode
# Use AUTO_DETECT for statistical analysis (default for EPUB)
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
toc_mode=TocExtractionMode.AUTO_DETECT,
book_meta=BookMeta(
title="Book Title",
authors=["Author"],
),
)
# Use LLM_ENHANCED for LLM-powered extraction (requires toc_llm configuration)
from pdf_craft import LLM
toc_llm = LLM(
key="your-api-key",
url="https://api.openai.com/v1",
model="gpt-4",
token_encoding="cl100k_base",
)
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
toc_mode=TocExtractionMode.LLM_ENHANCED,
toc_llm=toc_llm,
book_meta=BookMeta(
title="Book Title",
authors=["Author"],
),
)
Notes
- Important: When using
TocExtractionMode.LLM_ENHANCED, thetoc_llmparameter must be configured. The conversion will fail iftoc_llmis not provided. - This feature is most beneficial for books with complex chapter hierarchies
- The statistical TOC page detection remains as the first step, with LLM analysis enhancing the extraction quality
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.8...v1.0.9
This release brings enhanced error handling flexibility, improved OCR text quality, and important security fixes.
What's Changed
Features
-
Enhanced Error Handling: The
ignore_pdf_errorsandignore_ocr_errorsparameters now accept custom checker functions in addition to boolean flags, enabling more granular control over error suppression in https://github.com/oomol-lab/pdf-craft/pull/323 -
Improved OCR Text Quality: Implemented n-gram detection to automatically filter out repetitive character sequences that indicate neural text degradation in https://github.com/oomol-lab/pdf-craft/pull/330
Security
- Security Fix: Upgraded
pypdffrom^6.4.1to^6.6.0to address CVE-2026-22691 vulnerability in https://github.com/oomol-lab/pdf-craft/pull/329- Fixes issue where malicious PDFs could cause long-running processes when processing invalid startxref entries
- Resolves https://github.com/oomol-lab/pdf-craft/issues/328
Other
- Code formatting improvements in https://github.com/oomol-lab/pdf-craft/pull/331
- README image link update by @alwaysmavs in https://github.com/oomol-lab/pdf-craft/pull/324
Example Usage
Custom Error Handling with Functions
from pdf_craft import transform_markdown, OCRError
def should_ignore_ocr_error(error: OCRError) -> bool:
# Only ignore specific types of OCR errors
return error.kind == "recognition_failed"
transform_markdown(
pdf_path="input.pdf",
markdown_path="output.md",
ignore_ocr_errors=should_ignore_ocr_error, # Pass custom function
)
Traditional Boolean Error Handling (Still Supported)
from pdf_craft import transform_markdown
transform_markdown(
pdf_path="input.pdf",
markdown_path="output.md",
ignore_ocr_errors=True, # Simple boolean flag
)
API Changes
The following parameters have been enhanced to accept both boolean values and callable functions:
ignore_pdf_errors:bool | Callable[[PDFError], bool]ignore_ocr_errors:bool | Callable[[OCRError], bool]
This change is fully backward compatible - existing code using boolean values will continue to work without modifications.
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.7...v1.0.8
This release adds support for including cover images in both Markdown and EPUB conversions, enhancing the output format options.
What's Changed
Features
- Cover Image Support: Added
includes_coverparameter to bothtransform_markdownandtransform_epubfunctions, allowing you to include the PDF's cover page as an image in the output in https://github.com/oomol-lab/pdf-craft/pull/319- For Markdown conversion: The cover image is saved to the images folder and can be referenced in your document
- For EPUB conversion: The cover image is properly embedded in the EPUB file structure
- Default value is
Falsefor Markdown (to maintain backward compatibility) andTruefor EPUB
Example Usage
Markdown with Cover
from pdf_craft import transform_markdown
transform_markdown(
pdf_path="input.pdf",
markdown_path="output.md",
markdown_assets_path="images",
includes_cover=True, # Include cover image
)
EPUB with Cover
from pdf_craft import transform_epub, BookMeta
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
includes_cover=True, # Include cover image (default)
book_meta=BookMeta(
title="Book Title",
authors=["Author"],
),
)
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.6...v1.0.7
This release brings significant improvements to PDF rendering control, text quality, and error handling capabilities.
What's Changed
Features
-
Flexible DPI Control: Added
dpiparameter to control PDF page rendering resolution (default: 300 DPI), allowing you to balance between image quality and file size in https://github.com/oomol-lab/pdf-craft/pull/315 -
Automatic Image Size Optimization: Introduced
max_page_image_file_sizeparameter that automatically adjusts DPI when generated images exceed specified size limits, preventing overly large output files in https://github.com/oomol-lab/pdf-craft/pull/315 -
Resilient OCR Processing: Added
ignore_ocr_errorsparameter to continue processing when OCR recognition fails on individual pages, instead of stopping the entire conversion in https://github.com/oomol-lab/pdf-craft/pull/314 -
Improved Text Quality: Automatically removes Unicode surrogate characters from OCR-extracted text and PDF metadata (title, authors, publisher, etc.), ensuring cleaner output and better compatibility with downstream tools in https://github.com/oomol-lab/pdf-craft/pull/316
Documentation
- DeepWiki Integration: Added DeepWiki badge for auto-refreshing documentation in by @YogeLiu https://github.com/oomol-lab/pdf-craft/pull/285
Dependencies
- Updated
epub-generatorto 0.1.6
Example Usage
from pdf_craft import transform_markdown
transform_markdown(
pdf_path="input.pdf",
markdown_path="output.md",
dpi=300, # Control rendering resolution
max_page_image_file_size=5242880, # 5MB limit per page
ignore_ocr_errors=True, # Continue on OCR failures
)
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.5...v1.0.6
Release v1.0.5
What's Changed
Bug Fixes
- GPU memory overflow: Fix out-of-memory errors on RTX 3060 (12GB VRAM) by upgrading doc-page-extractor dependency to optimize model loading sequence (https://github.com/oomol-lab/pdf-craft/pull/309, fixes https://github.com/oomol-lab/pdf-craft/issues/305)
- TOC detection: Improve table of contents detection accuracy by ensuring page indexes are consecutive sequences within the first 17% of document and adding _TOC_SCORE_MIN_RATIO limitation (https://github.com/oomol-lab/pdf-craft/pull/311, https://github.com/oomol-lab/pdf-craft/pull/313)
- Content processing: Fix content override issue (https://github.com/oomol-lab/pdf-craft/pull/312)
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.4...v1.0.5
Release v1.0.4
What's New
π― Table of Contents Detection and Smart Removal
pdf-craft now automatically detects and removes table of contents pages from the final output, preventing duplicate TOC content in generated EPUB files. The system uses statistical analysis to identify TOC pages by matching chapter titles against page content, then intelligently excludes these pages while preserving the navigation structure.
Related: https://github.com/oomol-lab/pdf-craft/issues/268
Key features:
- Automatic TOC page detection using Aho-Corasick substring matching
- Hierarchical TOC level analysis for improved chapter organization
- XML-based TOC storage for better performance and flexibility
- New
toc_assumedparameter to control TOC detection behavior (default:Truefor EPUB,Falsefor Markdown)
Implementation PRs:
- https://github.com/oomol-lab/pdf-craft/pull/297
- https://github.com/oomol-lab/pdf-craft/pull/298
- https://github.com/oomol-lab/pdf-craft/pull/299
- https://github.com/oomol-lab/pdf-craft/pull/300
- https://github.com/oomol-lab/pdf-craft/pull/301
- https://github.com/oomol-lab/pdf-craft/pull/302
- https://github.com/oomol-lab/pdf-craft/pull/303
π Raw HTML Tag Support in Markdown
Full support for CommonMark-compliant raw HTML tags in Markdown output. DeepSeek OCR often generates HTML tags (like <sup> for superscripts) when processing scanned books - these are now properly preserved and rendered in both Markdown and EPUB formats.
Related: https://github.com/oomol-lab/pdf-craft/issues/283
Supported tags include:
- Inline tags:
<sup>,<sub>,<mark>,<u>,<kbd> - Block-level tags:
<div>,<center>,<details>,<summary> - Automatic safety filtering and attribute validation
Implementation PRs:
- https://github.com/oomol-lab/pdf-craft/pull/290
- https://github.com/oomol-lab/pdf-craft/pull/291
- https://github.com/oomol-lab/pdf-craft/pull/292
- https://github.com/oomol-lab/pdf-craft/pull/294
π Enhanced Table Rendering
Tables are now rendered in native HTML format for both Markdown and EPUB outputs, providing better structure and readability. Asset metadata now supports structured titles and captions for equations, images, and tables.
https://github.com/oomol-lab/pdf-craft/pull/306
π PDF Metadata Extraction
Automatically extracts book metadata (title, authors, publisher, ISBN, etc.) from PDF files and uses it to populate EPUB metadata. No need to manually specify book information when the PDF already contains it.
https://github.com/oomol-lab/pdf-craft/pull/284
π° Multi-Column Layout Detection
Improved handling of multi-column layouts (common in academic papers and magazines) through histogram valley detection and coefficient-of-variation splitting. Layouts are now correctly grouped by column segments before processing.
https://github.com/oomol-lab/pdf-craft/pull/286
π Bug Fixes
-
Fixed PIL crash on invalid bounding boxes: Added validation and normalization for layout bounding boxes to prevent crashes when cropping images with invalid coordinates (https://github.com/oomol-lab/pdf-craft/pull/295)
-
Fixed DeepSeek OCR center tag handling: Ignored alignment tags (
<center>,<left>,<right>) generated by DeepSeek OCR that aren't needed in the output (https://github.com/oomol-lab/pdf-craft/pull/307)
π§ Improvements
-
Refined layout joining logic: Improved paragraph merging across page boundaries with better handling of override assets and line continuation (https://github.com/oomol-lab/pdf-craft/pull/287, https://github.com/oomol-lab/pdf-craft/pull/288)
-
Updated dependencies:
- Upgraded
doc-page-extractorfrom 1.0.10 to 1.0.11 (https://github.com/oomol-lab/pdf-craft/pull/289) - Upgraded
epub-generatorfrom 0.1.2 to 0.1.5 - Added
pyahocorasick2.2.0 for efficient substring matching
- Upgraded
-
CI/CD enhancements: Added merge-build workflow for automated builds on main branch pushes (https://github.com/oomol-lab/pdf-craft/pull/289)
π Documentation
- Updated README with new
toc_assumedparameter documentation (https://github.com/oomol-lab/pdf-craft/pull/304) - Refreshed documentation images with hosted assets
π API Changes
New Parameters
toc_assumedparameter intransform_markdown()andtransform_epub():- When
True: Attempts to locate and extract TOC from PDF to build document structure - When
False: Generates TOC based on document headings only - Default:
Truefor EPUB,Falsefor Markdown
- When
New Exports
PDFDocumentMetadata: Dataclass for PDF metadata extraction
π Contributors
Thanks to everyone who contributed to this release!
π¦ Installation
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install pdf-craft==1.0.4
For detailed installation instructions, see the Installation Guide.
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.3...v1.0.4
Release v1.0.3
What's Changed
License Improvements
- Removed PyMuPDF (fitz) Dependency: Replaced PyMuPDF (AGPL-3.0) with Poppler for PDF parsing and rendering, maintaining pdf-craft's MIT license compatibility
- pdf-craft now uses Poppler via
pdf2image(MIT) for all PDF operations - This change ensures the entire project remains under the permissive MIT license
- pdf-craft now uses Poppler via
New Features
- Custom PDF Handler Support: Added
pdf_handlerparameter topredownload_models(),transform_markdown(), andtransform_epub()functions, allowing users to customize PDF rendering implementation - Poppler Integration: Migrated to Poppler (via
pdf2image) for PDF parsing and rendering, providing better compatibility and control - New Public APIs: Exported
PDFHandler,PDFDocument,DefaultPDFHandler, andDefaultPDFDocumentfor advanced customization - RENDERED Event: Added
OCREventKind.RENDEREDevent to track PDF page rendering progress
Breaking Changes
β οΈ Parameter Renamed: ignore_fitz_errors β ignore_pdf_errors
- Update your code:
transform_markdown(..., ignore_pdf_errors=True)instead ofignore_fitz_errors=True - Update your code:
transform_epub(..., ignore_pdf_errors=True)instead ofignore_fitz_errors=True
β οΈ Exception Renamed: FitzError β PDFError
- Update your exception handling code accordingly
Dependencies
- New Requirement: Poppler must be installed separately for PDF parsing
- Ubuntu/Debian:
sudo apt-get install poppler-utils - macOS:
brew install poppler - Windows: Download from oschwartz10612/poppler-windows
- See Installation Guide for details
- Ubuntu/Debian:
Bug Fixes
- Upgraded doc-page-extractor to fix bugs (#280)
Migration Guide
If you're upgrading from v1.0.2, please:
- Install Poppler following the Installation Guide
- Update parameter names in your code:
# Before (v1.0.2) transform_markdown(..., ignore_fitz_errors=True) # After (v1.0.3) transform_markdown(..., ignore_pdf_errors=True) - Update exception handling if you catch
FitzError:# Before (v1.0.2) from pdf_craft import FitzError # After (v1.0.3) from pdf_craft import PDFError
Full Changelog
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.2...v1.0.3
This release brings improvements to EPUB generation, inline LaTeX support, and enhanced handling of footnotes and tables.
What's Changed
New Features
-
Inline LaTeX Expression Support - Added support for preserving inline LaTeX mathematical expressions in both Markdown and EPUB outputs. A new
inline_latexparameter (default:True) allows you to control this behavior for EPUB conversion -
Assets in Footnotes - Footnotes can now contain images and other assets, which are properly preserved during conversion
Improvements
-
Enhanced Table of Contents Generation - Improved EPUB table of contents generation to analyze hierarchical structure based on font sizes, replacing the previous flat list format
-
Parameter Naming - Renamed parameters for better clarity:
modelβocr_size- Type:
DeepSeekOCRModelβDeepSeekOCRSize - https://github.com/oomol-lab/pdf-craft/pull/265
This change better reflects that the parameter controls the OCR model size rather than being a generic "model" reference.
Bug Fixes
- Fixed table rendering issues - https://github.com/oomol-lab/pdf-craft/pull/272
- Improved LaTeX escape handling - https://github.com/oomol-lab/pdf-craft/pull/270
Breaking Changes
β οΈ API Parameter Changes: The model parameter has been renamed to ocr_size in transform_markdown() and transform_epub() functions. Additionally, the type DeepSeekOCRModel has been renamed to DeepSeekOCRSize.
Migration:
# Old (v1.0.1)
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
model="gundam"
)
# New (v1.0.2)
transform_epub(
pdf_path="input.pdf",
epub_path="output.epub",
ocr_size="gundam"
)
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.1...v1.0.2
What's New in v1.0.1
- Enhanced Error Handling: Added structured error types (
FitzError,OCRError,InterruptedError) with detailed page and step information for better debugging - Improved Stability: Fixed crashes when encountering single-page PyMuPDF errors - now handles page-level failures gracefully
- Online Demo: Try PDF Craft directly in your browser at PDF Craft without any installation
What's Changed
- docs(project): add online demo links by @Moskize91 in https://github.com/oomol-lab/pdf-craft/pull/260
- feat: add new errors by @Moskize91 in https://github.com/oomol-lab/pdf-craft/pull/262
- feat: don't crash when find just a page of fitz error by @Moskize91 in https://github.com/oomol-lab/pdf-craft/pull/263
- doc(project): sync README.md by @Moskize91 in https://github.com/oomol-lab/pdf-craft/pull/264
Full Changelog: https://github.com/oomol-lab/pdf-craft/compare/v1.0.0...v1.0.1
π PDF Craft v1.0.0 Official Release
PDF Craft v1.0.0 is now officially released. This version includes major architectural changes and brings significant performance improvements.
π Core Changes: Fully Embracing DeepSeek OCR
The biggest change in v1.0.0 is the complete rewrite based on DeepSeek OCR, eliminating the dependency on LLM for text correction.
DeepSeek OCR is a powerful open-source OCR engine that supports complex content recognition (tables, formulas, images, footnotes, etc.) with excellent document structure understanding capabilities. Thanks to DeepSeek OCR, pdf-craft now offers:
- Fully Local Processing: The entire conversion process runs completely locally without any network requests. No need to configure LLM APIs, and no risk of conversion failures due to network issues or API outagesβin the old version, a single LLM request failure would halt the entire conversion process.
- Faster Speed: Compared to v0.2.8 which required multiple LLM calls for text correction, the new version uses direct OCR recognition with significantly improved speed.
- Higher Accuracy: DeepSeek OCR excels at document structure analysis, table recognition, and formula extraction, delivering high-quality results without secondary correction.
- Simpler API: Removed complex LLM configuration and multi-step processing workflows. Now conversion can be completed with a single function call.
Additionally, v1.0.0 has fully migrated to DeepSeek OCR (MIT License), removing the previous AGPL-3.0 dependency. The entire project now uses the more permissive MIT License, making it easier for commercial use and integration!
β οΈ Important Change: CUDA Environment Required
The new version requires a CUDA environment to run. This is because DeepSeek OCR depends on CUDA acceleration for efficient document recognition. The old version (v0.2.8) could work in pure CPU environments using LLM, but the new version cannot run without a GPU.
If your environment doesn't support CUDA, do not upgrade to v1.0.0. Continue using v0.2.8:
pip install pdf-craft==0.2.8
For specific CUDA environment installation instructions, please refer to the Installation Guide.
π« When NOT to Upgrade
Continue using v0.2.8 in the following situations:
- No GPU or CUDA Environment: The new version requires CUDA and cannot run without GPU
- Need LLM Text Correction: The new version has removed LLM correction functionality. If your use case requires secondary correction of OCR results, continue using the old version or use it in combination with epub-translator
π Acknowledgments
Thanks to DeepSeek OCR for being open source, and to all community members who have contributed code and feedback to pdf-craft!
If you have a CUDA environment, upgrade to v1.0.0 now and experience faster, more stable, and simpler PDF conversion! π
