[Improvement]PDF parsing fails when file exceeds page limit - add automatic splitting support

# Background
PDF parsing fails for large documents that exceed the Mineru parser's page limit. The system throws "Number of pages exceeds limit, please split the file and try again" error, causing the entire parsing task to fail. This prevents users from processing legitimate large documents like technical books.

**Error occurs in:**
- File: `/app/aperag/index/document_parser.py` line 267
- Method: `process_document_parsing()`
- Parser: Mineru parsing engine

**Example failure:**
```
Exception: Document parsing failed for /tmp/Designing Data-Intensive Applications...pdf: 
Mineru parsing failed: Number of pages exceeds limit, please split the file and try again
```

# Proposal
Add automatic PDF splitting functionality to handle large documents:

1. **Pre-processing check**: Detect page count before parsing
2. **Auto-split**: Automatically divide large PDFs into chunks within page limits  
3. **Batch processing**: Process chunks sequentially and merge results
4. **Progress tracking**: Show splitting and parsing progress to users
5. **Configurable limits**: Allow administrators to adjust page limits per environment

This would eliminate manual file preparation while maintaining parsing reliability for large documents.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement]PDF parsing fails when file exceeds page limit - add automatic splitting support #1230

Background

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Improvement]PDF parsing fails when file exceeds page limit - add automatic splitting support #1230

Description

Background

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions