Our data is invariably in table format. Typically we need to extract the following 3 tables from each PDF document
Canopy Extract is designed to extract any table (not just the 3 tables above) from any PDF document. In case you need to extract charts and images from a PDF document then Canopy Extract is not for you.
To work the PDF Extract needs two files
The Extract needs an Excel Configuration File (which describes the table to be extracted)
Multilayer headers and nesting are the key issues while extracting data from a PDF table
The Excel Configuration file for the above table is given below. Further details are on page Parts of a Config File
Some sample PDF documents and their corresponding Excel Configuration files are given below