This Python script allows you to easily combine multiple PDF files from a specified input directory into a single, comprehensive PDF document. It's designed to be straightforward and handle common scenarios gracefully.
- Batch Processing: Merges all PDF files found in a designated input folder.
- Ordered Merging: Combines PDFs in alphabetical order of their filenames, ensuring consistent output.
- Dedicated Input/Output: Reads PDFs from a
data/input
directory and saves the merged PDF to adata/output
directory. - Robust Error Handling:
- Gracefully handles cases where no PDFs are found.
- Ignores non-PDF files in the input directory.
- Provides informative messages for issues like corrupted PDF files or problems during the merge process.
Before you can use this script, make sure you have:
- Python 3.x installed on your system.
- pip (Python's package installer), which usually comes bundled with Python.
The script relies on the PyPDF2
library. You can install it using pip:
pip install PyPDF2
Follow these simple steps to merge your PDF files:
-
Save the Script: Save the provided Python code as
merge_pdfs.py
(or any other.py
filename you prefer) in a directory on your computer. -
Create Directory Structure: In the same directory where you saved the script, create the following folder structure:
. ├── data/ │ ├── input/ │ └── output/ └── merge_pdfs.py
-
Place Your PDFs: Put all the PDF files you want to combine into the
data/input/
directory. Ensure their filenames are in the order you wish them to be merged (e.g.,1_document.pdf
,2_report.pdf
,3_appendix.pdf
will merge in that order due to alphabetical sorting). -
Run the Script: Open your terminal or command prompt, navigate to the directory where you saved
merge_pdfs.py
, and run the script using the following command:python merge_pdfs.py
-
Find Your Merged PDF: After the script finishes running, the combined PDF, named
combined_document.pdf
, will be saved in thedata/output/
directory.
.
├── data/
│ ├── input/
│ │ ├── document_a.pdf
│ │ ├── document_b.pdf
│ │ └── document_c.pdf
│ └── output/
│ └── combined_document.pdf <-- Your merged PDF is here
└── merge_pdfs.py
The script includes basic error handling to provide feedback if something goes wrong. If you encounter issues, check the terminal output for messages indicating problems like missing files or corrupted PDFs.
Feel free to modify the input_directory
, output_directory
, or output_filename
variables within the merge_pdfs.py
script if you need a different setup.