Exploring PDFs

When looking at government services one thing that pops up over and over again are PDF forms. There are thousands of them in the Government of Canada. That’s not to be unexpected, we need to gather information from people and PDF forms fill that need.

That said, it can be worthwhile to sit down and determine if the usage of PDF forms are serving all populations of users, or if we can make other options available alongside PDF forms. For the purposes of this exploration we were only concerned with folks filling out PDF forms on a computer. We did not investigate the use case of providing printed forms to be filled out by hand and either scanned, faxed or mailed.

Background

PDF forms filled out on a computer they come in two distinct versions:

Of these two, acroforms are much simpler. They have been around for longer, they’re supported in many PDF viewing and editing applications. The form functionality they present is more limited but still providing most of what is needed for government forms. On the other hand xfa forms are much more complicated. They can add and remove pages in the PDF, generate barcodes in the document and do many, many other things. That increased power comes with the downside that they’re mostly only editable in Acrobat Reader. Even then, the use of XFA forms is deprecated.

Scope

For the purposes of this exploration we worked under a few constraints:

Changing business processes was outside the scope. While this is the best solution, it’s also a lot more difficult. What could we do with minimal disruption to the department but maximum benefit?
XFA forms were ignored. Working with XFA forms is a lot more complicated. They require special processing, can run JavaScript, create pages, generate barcodes. Trying to replace an XFA form is a much larger undertaking. So, we focused on acroform forms.

Implementation

With that in mind, what can we do? How can we make this simpler while being, essentially, invisible from the business perspective? It turns out, quite a lot. There are a large number of open source tools available to work with PDF files. We can extract information about PDF forms, generate new PDF files, and overlay multiple PDF files together. Basically, this means, we can create PDF files without the user needing to know they’re working with a PDF until the final document is generated.

For the project we decided to explore a few extra bells and whistles in order to determine what could be feasible. This included investigating face detection libraries for forms which required photo uploads and passport scanning libraries to read information from the encoded part of a passport in order to populate portions of the form.

At 10,000 feet, we used QPDF tool to extract information on the form from the PDF file. This gave us information on the different fields, default values, options for combo boxes and other information needed to fill out the form. Using that information, we created a website with a series of prompts for the user to gather the needed form data. The MRZ scanner was used to allow the user to scan their passport and auto-populate needed information form what is stored on the passport. Face API was used to allow us to take photo of the user and determine if a) the photo they took was of a person and b) if the facial expression of the user matched the requirements for the application. With all of the information entered, PDFKit was used to generate a new PDF file containing just a completed form. That was then passed back through QPDF to overlay the new form with the original PDF file, generating a completed form. The form could then be printed, emailed or otherwise processed as if the user had filled the form out by hand.

Tools Used

Passport scanning libraries to extract address, name and other information
Face detection libraries to judge if the photo meets the form requirements
PDF tools to extract form fields and merge PDFs
PDFKit to generate a PDF with various form fields filled in.

Lessons Learned

First and foremost, with a little bit of work and thought it’s possible to fill out PDFs without directly using PDF software. This ability provides an alternative way for users to interact with government services.

Having built the web front-end for the PDF file, redirecting that front-end to send the data to a given back-end system when the business processes are updated to handle non-PDF input becomes a much smaller task.

There are a large number of available open source tools which can make the processing of PDF files much simpler.

Future Approach

All that said, could we approached something like this the future? As mentioned above this is assuming that you have to generate the PDF files and want a stopgap until a more complete solution can be put into place.

Identify the form you wish to make easier to complete.
- Is the effort to fill out the form great enough to warrant the development time?
Identify the fields on the form that are required and what type of content would be in each field. (i.e. Phone number, email address, mailing address, etc)
- Do those fields need to be validated, and what does that validation look like? Tools like QPDF can be used to determine the validation required by the PDF file.
Create a website to gather and validate the needed form fields.
- The CDS Forms Team works to make form creation easier in the Government of Canada and may be able to help.
- This can be as simple or complicated as you desire. A single page with a form all the way up to a magical wizard to guide the user through the process.
- Work with designers for the content and style of the page.
- Work with researchers to validate and test the form flow and content.
Use tools such as PDFKit to generate a PDF with the completed fields.
Use QPDF to merge the generated PDF with the original form to create the completed form.
Present the generated PDF to the user to be submitted through existing channels.