CFPDF vs iText - A small battle for PDF manipulation with ColdFusion

I had a task in hand this week, populate a PDF form with million(s) of records, save them, create books of 200 pages and send them to the printer. This is an ongoing daily process. So I had to fine-tune everything like a fine car to run faster and not to hog up the server at the same time. Now this had unexpected consequences on me. This become my Heroin addiction without any euphoric benefits. I was not able to sleep. Drank coffee by the pints. Read half of the internet. Almost. Trying to generate PDF files within practical time frame.

First Attempt: This is where my addiction started. I used CFPDFFORM to populate my PDF. Original PDF form was 252 KB and my code:

   1: <cfpdfform source= "#fullPath#" destination= "#newPath#" overwriteData="Yes" overwrite= "Yes" action= "populate">
   2:                                 <cfpdfformparam name="Barcode3" value="1976" />
   3:                                 <cfpdfformparam name="Barcode4" value="1976" />
   4: </cfpdfform>
Show/Hide Line Numbers . Full Screen . Plain

This code took 240 Tick count to complete and returned a 343KB file, bigger than the original. Keep in my mind I have to run this million(s) of times and every tick count & every KB matters. Also in the next step I have to create books of 200 pages:

   1: <cfpdf action="merge" destination="#destinatonPDF#" overwrite="yes">
   2:                 <cfloop from="1" to="200" index="i">
   3:                                 <cfpdfparam source="#i#.pdf">
   4:                 </cfloop>
   5: </cfpdf>
Show/Hide Line Numbers . Full Screen . Plain

I find PDF merge always the biggest (smallest?) bottleneck when working with PDFs and predictably this process time-out on me. It could not simply handle 200 of 343KB files with forms. So I decided to flatten the PDFs first. 

Disappointingly CFPDFFORM could not flatten PDF file and we have to read the pdf file again from the hard drive using CFPDF and write back.

   1: <cfpdfform source= "#fullPath#" destination= "#newPath#" overwriteData="Yes" overwrite= "Yes" action= "populate">
   2:                                 <cfpdfformparam name="Barcode3" value="1976" />
   3:                                 <cfpdfformparam name="Barcode4" value="1976" />
   4: </cfpdfform>
   5: 
   6: <cfpdf action="write" flatten ="yes" source="#newPath#" destination ="#newPath#" overwrite="yes">
Show/Hide Line Numbers . Full Screen . Plain

This reduce the PDF file size to 136KB - more than a half. Excellent. CFPDF merge ran on these files like a pig on espresso. No time out. But (there is always a "But" waiting to ruin my day) PDF populate and flatten took 600 tick counts. Extra 400 tick counts for flattening. And this case the populate process to time-out. (Now this is the place I scream something highly sarcastic like "Oh what a Joy!")

Second Attempt:
After giving up every single option I can think of within CF (like CFTHREAD), I decided to move into iText.

   1: <cfset pdfReader             = createObject("java","com.itextpdf.text.pdf.PdfReader").init(PathtoPDFform)>
   2:  <cfset newPDF                 = createObject("java","java.io.FileOutputStream").init(PathtoSavePDF)>
   3:  <cfset PdfStamper         = createObject("java","com.itextpdf.text.pdf.PdfStamper").init(pdfReader,newPDF)>
   4:  <cfset PdfStamper.setFormFlattening(True)>
   5:  <cfset fields                        = PdfStamper.getAcroFields()>
   6:  <cfset fields.setField('Barcode3','1976')>
   7:  <cfset fields.setField('Barcode4','1976')>
   8:  <cfset PdfStamper.setFullCompression()>
   9:  <cfset PdfStamper.getReader().removeUnusedObjects()>
  10:  <cfset PdfStamper.close()>
  11:  <cfset newPDF.close()>
Show/Hide Line Numbers . Full Screen . Plain

This code completed within magnificent 50 Tick Counts (vs 600 Ticks from CF). It literally gave me Goosebumps. But (here we go again). This did not reduce the file size significantly, returned 219KB (vs 136KB by CFPDF). 

iText setFormFlattening() flatten forms, true - but CFPDF flatten ="yes" went extra bit longer subseting fonts effectively and reducing the file size. If I create Reduce Size PDF using Acrobat, it can actually reduce the file size unto 20KB (It un-embed the font Arial) That brought me to the question why CFPDF could not perform same as Adobe Acrobat? I guess Adobe office complex is an extremely large place, there is very little possibility for an team member from CF and Acrobat to bump on to each other by the water cooler or any such place people normally bump on to.

After every possible trick I can muster, I was not able to reduce the file size using iText any more. <cfset PdfStamper.getReader().removeFields()> <cfset PdfStamper.getReader().removeAnnotations()> reduced the file size but disfigure the PDF so much, it drop fonts, it drop fields. I tried copying the PDF page on to a new blank document using com.itextpdf.text.pdf.PdfSmartCopy without any effect. 

Report Card:
- Original File: 253KB (No images, multiple fonts)
- (CF9) CFPDFForm Populate : 240 Tick Counts (iText Wins) 
- (CF9) CFPDF flattering: 400 Tick Counts - 136KB (CFPDF wins when the file size matters, if speed is the only concern iText wins)
- iText Populate (with or without flattening) : 40-50 Tick Counts : 219KB
- Railo CFPDF works same as iText

I was not able to speed up <CFPDF> tag, well, there is nothing we can do with that. I end up using iText to populate the PDF form and CFPDF to flatten it. That makes my populate and merge process run without a time-out, but not as fast as I want it to be. I'm still an hopeless addict with a need to reach the CFPDF file size performance with the iText speed.

6 Comments :
Alan
Friday 01 February 2013 06:33 PM
Saman, very interesting and may help me solve a problem. I have a cfc which uses html/css to format data into a sigle variable and I need to output that as a pdf for attachment to an email. There are issues with cfdocument so I wondered how difficult it would be to use iText to create a pdf containing this variable. I'm a real newbie so any guidance would be a real help.
Friday 01 February 2013 07:54 PM
Alan nothing you mention above seems outside the power for standard cfdocument
Alan
Saturday 02 February 2013 02:59 AM
Tim, thank you. Unfortunately, cfdocument hangs in a way that has been previously seen in cf7 but not in cf9. After 2 weeks of trying to fix it, I'm forced to find an alternative in order to keep production going.

Thursday 13 December 2012 04:37 PM
I would be interested in getting a sample PDF to work with from you. Question: are your running CF 9 standard or enterprise?
Thursday 13 December 2012 04:41 PM
I'm running on CF 9 standard. I will mail you an sample file.
Thursday 13 December 2012 04:43 PM
I will try the exact same code on enterprise, I know that Adobe throttles PDF functionality by "single threading" it.