I had a task in hand this week, populate a PDF form with million(s) of records, save them, create books of 200 pages and send them to the printer. This is an ongoing daily process. So I had to fine-tune everything like a fine car to run faster and not to hog up the server at the same time. Now this had unexpected consequences on me. This become my Heroin addiction without any euphoric benefits. I was not able to sleep. Drank coffee by the pints. Read half of the internet. Almost. Trying to generate PDF files within practical time frame.
First Attempt: This is where my addiction started. I used CFPDFFORM to populate my PDF. Original PDF form was 252 KB and my code:
1: <cfpdfform source= "#fullPath#" destination= "#newPath#" overwriteData="Yes" overwrite= "Yes" action= "populate"> 2: <cfpdfformparam name="Barcode3" value="1976" /> 3: <cfpdfformparam name="Barcode4" value="1976" /> 4: </cfpdfform>
This code took 240 Tick count to complete and returned a 343KB file, bigger than the original. Keep in my mind I have to run this million(s) of times and every tick count & every KB matters. Also in the next step I have to create books of 200 pages:
1: <cfpdf action="merge" destination="#destinatonPDF#" overwrite="yes"> 2: <cfloop from="1" to="200" index="i"> 3: <cfpdfparam source="#i#.pdf"> 4: </cfloop> 5: </cfpdf>
I find PDF merge always the biggest (smallest?) bottleneck when working with PDFs and predictably this process time-out on me. It could not simply handle 200 of 343KB files with forms. So I decided to flatten the PDFs first.
Disappointingly CFPDFFORM could not flatten PDF file and we have to read the pdf file again from the hard drive using CFPDF and write back.
1: <cfpdfform source= "#fullPath#" destination= "#newPath#" overwriteData="Yes" overwrite= "Yes" action= "populate"> 2: <cfpdfformparam name="Barcode3" value="1976" /> 3: <cfpdfformparam name="Barcode4" value="1976" /> 4: </cfpdfform> 5: 6: <cfpdf action="write" flatten ="yes" source="#newPath#" destination ="#newPath#" overwrite="yes">
This reduce the PDF file size to 136KB - more than a half. Excellent. CFPDF merge ran on these files like a pig on espresso. No time out. But (there is always a "But" waiting to ruin my day) PDF populate and flatten took 600 tick counts. Extra 400 tick counts for flattening. And this case the populate process to time-out. (Now this is the place I scream something highly sarcastic like "Oh what a Joy!")
Second Attempt:
After giving up every single option I can think of within CF (like CFTHREAD), I decided to move into iText.
1: <cfset pdfReader = createObject("java","com.itextpdf.text.pdf.PdfReader").init(PathtoPDFform)> 2: <cfset newPDF = createObject("java","java.io.FileOutputStream").init(PathtoSavePDF)> 3: <cfset PdfStamper = createObject("java","com.itextpdf.text.pdf.PdfStamper").init(pdfReader,newPDF)> 4: <cfset PdfStamper.setFormFlattening(True)> 5: <cfset fields = PdfStamper.getAcroFields()> 6: <cfset fields.setField('Barcode3','1976')> 7: <cfset fields.setField('Barcode4','1976')> 8: <cfset PdfStamper.setFullCompression()> 9: <cfset PdfStamper.getReader().removeUnusedObjects()> 10: <cfset PdfStamper.close()> 11: <cfset newPDF.close()> This code completed within magnificent 50 Tick Counts (vs 600 Ticks from CF). It literally gave me Goosebumps. But (here we go again). This did not reduce the file size significantly, returned 219KB (vs 136KB by CFPDF).
iText
setFormFlattening() flatten forms, true - but
CFPDF flatten ="yes" went extra bit longer subseting fonts effectively and reducing the file size. If I create
Reduce Size PDF using Acrobat, it can actually reduce the file size unto 20KB (It un-embed the font Arial) That brought me to the question why CFPDF could not perform same as Adobe Acrobat? I guess Adobe office complex is an extremely large place, there is very little possibility for an team member from CF and Acrobat to bump on to each other by the water cooler or any such place people normally bump on to.
After every possible trick I can muster, I was not able to reduce the file size using iText any more.
<cfset PdfStamper.getReader().removeFields()> <cfset PdfStamper.getReader().removeAnnotations()> reduced the file size but disfigure the PDF so much, it drop fonts, it drop fields. I tried copying the PDF page on to a new blank document using
com.itextpdf.text.pdf.PdfSmartCopy without any effect.
Report Card:
- Original File: 253KB (No images, multiple fonts)
- (CF9) CFPDFForm Populate : 240 Tick Counts
(iText Wins) - (CF9) CFPDF flattering: 400 Tick Counts - 136KB
(CFPDF wins when the file size matters, if speed is the only concern iText wins) - iText Populate (with or without flattening) : 40-50 Tick Counts : 219KB
- Railo CFPDF works same as iText
I was not able to speed up <CFPDF> tag, well, there is nothing we can do with that. I end up using iText to populate the PDF form and CFPDF to flatten it. That makes my populate and merge process run without a time-out, but not as fast as I want it to be. I'm still an hopeless addict with a need to reach the CFPDF file size performance with the iText speed.
Posted by Saman W Jayasekara at Thursday 13 December 2012 04:09 PM
.
ColdFusion