Exploring PDFs
When looking at government services one thing that pops up over and over again are PDF forms. There are thousands of them in the Government of Canada. That’s not to be unexpected, we need to gather information from people and PDF forms fill that need.
That said, it can be worthwhile to sit down and determine if the usage of PDF forms are serving all populations of users, or if we can make other options available alongside PDF forms. For the purposes of this exploration we were only concerned with folks filling out PDF forms on a computer. We did not investigate the use case of providing printed forms to be filled out by hand and either scanned, faxed or mailed.
Background
PDF forms filled out on a computer they come in two distinct versions:
Of these two, acroforms
are much simpler. They have been around for
longer, they’re supported in many PDF viewing and editing applications.
The form functionality they present is more limited but still providing
most of what is needed for government forms. On the other hand xfa forms
are much more complicated. They can add and remove pages in the PDF,
generate barcodes in the document and do many, many other things.
That increased power comes with the downside that they’re mostly only
editable in Acrobat Reader. Even then, the use of XFA forms is
deprecated.
Scope
For the purposes of this exploration we worked under a few constraints:
- Changing business processes was outside the scope. While this is the best solution, it’s also a lot more difficult. What could we do with minimal disruption to the department but maximum benefit?
- XFA forms were ignored. Working with XFA forms is a lot more
complicated. They require special processing, can run JavaScript,
create pages, generate barcodes. Trying to replace an XFA form is a
much larger undertaking. So, we focused on
acroform
forms.
Implementation
With that in mind, what can we do? How can we make this simpler while being, essentially, invisible from the business perspective? It turns out, quite a lot. There are a large number of open source tools available to work with PDF files. We can extract information about PDF forms, generate new PDF files, and overlay multiple PDF files together. Basically, this means, we can create PDF files without the user needing to know they’re working with a PDF until the final document is generated.
For the project we decided to explore a few extra bells and whistles in order to determine what could be feasible. This included investigating face detection libraries for forms which required photo uploads and passport scanning libraries to read information from the encoded part of a passport in order to populate portions of the form.
At 10,000 feet, we used QPDF
tool to extract information on the form
from the PDF file. This gave us information on the different fields,
default values, options for combo boxes and other information needed to
fill out the form. Using that information, we created a website with a
series of prompts for the user to gather the needed form data. The
MRZ scanner
was used to allow the user to scan their passport and
auto-populate needed information form what is stored on the passport.
Face API
was used to allow us to take photo of the user and determine
if a) the photo they took was of a person and b) if the facial expression
of the user matched the requirements for the application. With all of
the information entered, PDFKit
was used to generate a new PDF file
containing just a completed form. That was then passed back through
QPDF
to overlay the new form with the original PDF file, generating a
completed form. The form could then be printed, emailed or otherwise
processed as if the user had filled the form out by hand.
Tools Used
- Passport scanning libraries to extract address, name and other information
- Face detection libraries to judge if the photo meets the form requirements
- PDF tools to extract form fields and merge PDFs
- PDFKit to generate a PDF with various form fields filled in.
Lessons Learned
First and foremost, with a little bit of work and thought it’s possible to fill out PDFs without directly using PDF software. This ability provides an alternative way for users to interact with government services.
Having built the web front-end for the PDF file, redirecting that front-end to send the data to a given back-end system when the business processes are updated to handle non-PDF input becomes a much smaller task.
There are a large number of available open source tools which can make the processing of PDF files much simpler.
Future Approach
All that said, could we approached something like this the future? As mentioned above this is assuming that you have to generate the PDF files and want a stopgap until a more complete solution can be put into place.
- Identify the form you wish to make easier to complete.
- Is the effort to fill out the form great enough to warrant the development time?
- Identify the fields on the form that are required and what type of content
would be in each field. (i.e. Phone number, email address, mailing address, etc)
- Do those fields need to be validated, and what does that validation look like? Tools like QPDF can be used to determine the validation required by the PDF file.
- Create a website to gather and validate the needed form fields.
- The CDS Forms Team works to make form creation easier in the Government of Canada and may be able to help.
- This can be as simple or complicated as you desire. A single page with a form all the way up to a magical wizard to guide the user through the process.
- Work with designers for the content and style of the page.
- Work with researchers to validate and test the form flow and content.
- Use tools such as PDFKit to generate a PDF with the completed fields.
- Use QPDF to merge the generated PDF with the original form to create the completed form.
- Present the generated PDF to the user to be submitted through existing channels.