Python: how to merge PDF files

In this article we will see how to merge a series of PDF files with Python.

Merging PDF files simply means generating a single final PDF file containing all the contents of the PDF files present in the initial list.

The PyPDF2 module also provides this possibility among its various PDF file management features.

import PyPDF2

def merge_pdf_files(pdf_files = None, target_pdf_file = 'merged.pdf'):
    if pdf_files is None:
        return False
    merger = PyPDF2.PdfFileMerger()
    for pdf_file in pdf_files:
        try:
            merger.append(PyPDF2.PdfFileReader(pdf_file, 'rb'))
        except ValueError:
            return False
        except Exception:
            return False
     merger.write(target_pdf_file)
     merger.close()
     return True

Our function creates an instance of the PdfFileMerger class and then loops through the list of input PDF files by invoking the append() method of the aforementioned class. All of our code can raise errors, so we wrapped it in a try/except block.

The most common errors that can occur concern the structure of the examined PDF document, its dimensions and the fact that each entry in the PDF file list actually corresponds to an existing file in the file system.

Once the loop ends, we write the buffer created in the destination file and free the resources in use using the close() method. The fundamental feature of PyPDF2 to keep in mind is that this library also carries out an internal validation of the examined files. For example, if the character encoding has errors, the operations carried out will fail. The best results are obtained with PDF files in standard format and not excessively large.