23 Sep 2010
Reading PDF Form Fields with VBA
Update:
Please visit the same post on my business site. The comments are closed here, so if you want to comment, you have to head over to http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/
I’ve written about VBA and Acrobat JavaScript before, and I’ve also mentioned that you can combine VBA and JavaScript to access PDF form fields, but I still owe a sample for that. I had to answer another question today about how to exactly do that, so I whipped up a quick sample program that demonstrates the use of the JavaScript Object (JSO) to read and write AcroForm fields.
We start the same way as in my old VBA sample to create a VBA program that references the Acrobat TLB and to add a button to a document. When we now use the following script as the button handler, we can work with form fields:
Private Sub CommandButton1_Click() Dim AcroApp As Acrobat.CAcroApp Dim theForm As Acrobat.CAcroPDDoc Dim jso As Object Dim text1, text2 As String Set AcroApp = CreateObject("AcroExch.App") Set theForm = CreateObject("AcroExch.PDDoc") theForm.Open ("C:\temp\sampleForm.pdf") Set jso = theForm.GetJSObject ' get the information from the form fields Text1 and Text2 text1 = jso.getField("Text1").Value text2 = jso.getField("Text2").Value MsgBox "Values read from PDF: " & text1 & " " & text2 ' set a text field Dim field2 As Object Set field2 = jso.getField("Text2") field2.Value = 13 ' assign the number 13 to the fields value ' get the information from the form fields Text1 and Text2 text1 = jso.getField("Text1").Value text2 = jso.getField("Text2").Value MsgBox "Values read from PDF: " & text1 & " " & text2 theForm.Close AcroApp.Exit Set AcroApp = Nothing Set theForm = Nothing MsgBox "Done" End Sub
This program requires a PDF file with text fields called “Text1” and “Text2” to be stored as C:\temp\sampleForm.pdf. With the explanation in the previous two blog posts, it should not be hard to understand what’s going on here. The only new command introduced is the getField() function, which returns a form field. The form field object has a property “value” which contains the actual value that’s assigned to the field. Give it a try and let me know how it works for you. The updated form field is not saved (because the document does not get saved) – I’ll leave that up to the reader to figure out.
Also, this program will not work with XFA forms (the ones you create in Designer). For those, you need to use the XFA DOM to access the form data. For anybody interested in XFA forms, the LifeCycle Designer ES Scripting Reference is a must read.
Hi there,
This is exactly what I am looking for. Thanks so much.
However, my assignments are a bit different. I am wondering if you can give me a hand.
The data source:
1. An excel spreadsheet storing the raw data
2. A PDF file with an interactive form used to store the data input by the user according to the above excel spreadsheet
My assignments:
1. input the raw data from excel spreadsheet to the PDF interactive form
2. double check if the data input in the PDF interactive form is correct.
I am not allowed to convert the Excel spreadsheet to the PDF file directly as the PDF file is the template with precise paragraphing and wording embedded. It is a heavy job when there are over hundreds of number. I am thinking if the excel VBA can do both assignments automatically or at least double check my input.
Thanks.
Wayne
Wayne
September 25th, 2010 at 8:31 pmpermalink
Hi khk,
I have adobe 9.1 professional and excel 2007, adobe TLB added.
F8 stepinto F8 found “Runtime error ‘424’, object required”
debug at this code ” text1 = jso.getField(“Text1″).Value”
don’t know why?
Thanks
Wayne
Wayne
September 25th, 2010 at 10:16 pmpermalink
oh, “t” not “T”, i got it.
btw, we can use the call function to save the updates.
call theForm.Save(PDSaveFull,”C:\temp\sampleForm.pdf).
Wayne
September 27th, 2010 at 11:24 ampermalink
I am having the same problem as Wayne, but cannot find the solution. Error 424 Object Required at –
text1 = jsoGetField(“TextField1”).Value
Further, text1 changes from a “” string to an empty Variant
Any help would be greatly appreciated. Access 2007 with Adobe 8 btw.
Thanks,
Todd
Todd
February 16th, 2011 at 3:27 pmpermalink
Hi Khk,
i was looking for the similar, kind of program.
am objective is:
A PDF document contains a stamps like tick 1, tick 2, etc in different pages. i want to read the page numbers of each stamps. . is there any java function to read the page number of PDF ?
chandu
September 25th, 2011 at 1:47 pmpermalink
Dear Khk
I have done this example in Visual Basic 2008 and read fields from a document that I made in Adobe Acrobat 9 Pro.
It works fine on my PC. But when I make an installation on another PC and start the program and click the button, the message “Cannot create an ActiveX component” appear and close down the program.
Look forward to a solution for distribution.
Best regards
Leif
Leif
October 5th, 2011 at 3:36 pmpermalink
Leif,
do you have Adobe Acrobat installed on the second machine? Acrobat’s API is a tool to automate Acrobat, so Acrobat needs to be installed for your application to work. Based on the error message, I assume that you are trying to run your application on a system without Acrobat.
khk
October 5th, 2011 at 7:03 pmpermalink
Dear Khk
Thanks for your prompt response.
I have Adobe Reader 9 on the other PC.
Is it necessary to also have Acrobat 9 pro installed ?
If so, is there another way to distribute without have
Acrobat pro installed on target PC ?
Leif
Leif Hemmingsson
October 6th, 2011 at 1:36 ampermalink
Khk
Thank You for prompt answer.
The target PC have Adobe Reader 9.
Is there no other way to have it run without having Acrobat Pro installed?
Leif
Leif Hemmingsson
October 6th, 2011 at 4:16 ampermalink
Dear khk,
Thank u for your share of this post.
I tried the code ,everything is OK, but the line “field2.Value = 13” didn’t work, no value assigned.
I have done this example in MS Excel 2007 & Adobe Acrobat 9 Pro.
I have selected the “Tools>References” menu item. On the dialog that pops up, I choose “Acrobat”, but I am not sure that it was the acrobat.tlb or acrobat.dll selected. When I check the References again, it was not the “Acrobat” selected, but the Adobe Acrobat 9.0 Type Library. “Acrobat” disappeared in the references box.
Looking forward for your help.
Thanks a lot.
Best Regards.
Casey YAN
Casey
October 20th, 2011 at 2:08 ampermalink
Hi khk,
I have adobe acrobat 10.0. I was triing to run the above macro, but i am getting an error ‘Activex component can’t create object’
Looking forward for you help
Thanks a lot
Regards,
Shekhar
shekhar
February 9th, 2012 at 4:39 ampermalink
[…] several threads that suggest various methods. This one seems to be exactly what you are looking for http://www.khk.net/wordpress/2010/09…elds-with-vba/ __________________ If this resolves your issue, please use the Thread Tools and mark the thread […]
Importing data from PDF to Access
February 10th, 2012 at 1:57 ampermalink
Shekar,
are you using the full version of Acrobat or the free Reader? This will not work in the Reader. If it’s Acrobat, then unfortunately I cannot help you: I am not a VB expert, and all I know is that if you follow the steps outlined above, it should work.
khk
February 16th, 2012 at 1:03 pmpermalink
Thanks so much for posting this! It was EXACTLY what I needed to solve my dilemma. I knew I needed to code but I couldn’t find anything clear and understandable about the Acrobat object in VBA anywhere.
I ran into one snag in that I have a large number of fields to fill and my code would bomb after only three fields were filled. I figured out that it didn’t like me creating a separate form object for each field. I worked around that by creating a single object that I used in a function where I passed my field names and field values to as variables to set the field object and the field value for each field.
Thanks again!
Cathi
March 22nd, 2012 at 3:23 pmpermalink
I have recently come across your blog and it is extremely helpful. The level of detail coupled with the explanations are proving extremely helpful. I figured that while I endeavor to figure something out on my own I might as well ask in the event that you can save me a lot of time.
You mentioned at the end of this post that this code will not work with XFA forms. I am working with an XFA form and I am trying to do something relatively strait forward: I need to reference the data entered into a specific text box. I will use that data (someone’s name) to resave the pdf with the person’s name in the file name. Would you be able to help me get the same basic code working using the XFA DOM?
Sincerely – Joe
Hi khk
March 27th, 2013 at 9:25 ampermalink
here is the fix for the error.
text1 = jso.getField(“topmostSubform[0].Page1[0].f1_01_0_[0]”).Value
sameer
April 1st, 2013 at 12:19 pmpermalink
KHK:
This is a very helpful example, so thank you for posting.
I am trying to ADD a signature field to PDF through VBA, but am having difficulty. My code is:
Private Sub AddSignature(docName As String, pNum As Long)
Dim jso As Object
Dim AcroApp As acrobat.CAcroApp
Dim theDoc As acrobat.CAcroPDDoc
Dim signatureField As Object
Set AcroApp = CreateObject(“AcroExch.App”)
Set theDoc = CreateObject(“AcroExch.PDDoc”)
theDoc.Open (docName)
Set jso = theDoc.GetJSObject
signatureField = jso.addField(“Signature”, “signature”, pNum, [90, 385, 250, 420])
theDoc.Save 1, docName
theDoc.Close
Set AcroApp = Nothing
Set theDoc = Nothing
End Sub
I am getting a type mismatch error on the “signatureField = ” line.
Any ideas?
Thanks.
MP.
Martin Petrey
April 12th, 2013 at 1:26 pmpermalink
Write a read routine to see the names of your fields. Most likely the field name is not just “signature”. Just as you see in my example. I am new to PDF and not sure how modify the field names well but will post more when I figure it out.
Sameer Alzouby
April 12th, 2013 at 2:56 pmpermalink
Hello,
is it possible to copy data ,once selected, as table (copy as table option) in a pdf using vba?
Thanks in advance
Jeffrey
May 25th, 2013 at 3:03 pmpermalink
Hello,
Can we label data in pdf file with php?
waqar
June 28th, 2013 at 7:07 ampermalink
Dear Khk,
Thankyou somuch , It’s Very good post given by you which i used to fill data in my PDF form generated using Acrobat pro.
It would be great if anyone give idea regarding importing/Exporting XML/XSD Data in the VB.net Code or please guide which method to use so that i can directly load PDF with my XML data instead of getting fields and iterating it to bind to respective fields
Thanks
Mayank
Mayank Kashyap
August 7th, 2013 at 10:38 ampermalink
Thanks so much for your examples.
I am trying to write a macro which will extract embedded files from a document (in this case, a Word doc), and am trying to use some of your code and other snippets I’ve found.
If (InStr(myshape.OLEFormat.ClassType, “Acro”) > 0) Then
myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ‘ Open the first embedded pdf
‘ Now I want to save it
Set AcroApp = GetObject(, “AcroExch.App”)
If AcroApp Is Nothing Then
‘ Excel is not running, create new instance
Set AcroApp = CreateObject(“AcroExch.App”)
Set avdoc = AcroApp.GetActiveDoc ‘ Set AcroDocument = CreateObject(“AcroExch.PDDoc”)
End If
‘ older Word files had ” on each end of the IconLabel of the embedded file, resulting in a path name with “” in it
‘ this was causing an error 4148 when the SaveAs line executed. Go figure 🙂
‘ temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), “”), Chr(34), “”))
outFileName = StrOutFold & “\” + temp ‘
‘ ok = ActiveDocument.DocSaveAs(outFileName, outFileName)
If avdoc.Save(PDSaveFull, outFileName) = False Then
MsgBox “Cannot save document”
End If
AcroApp.Exit
Set AcroApp = Nothing
End If
However, I am struggling with getting a valid object reference to the .pdf document displayed using the myshape.OLEFormat.DoVerb (wdOLEVerbOpen)
line. Any hints as to how to get a reference to this? Also, it would be best if this would work with Acrobat reader (I don’t think I need full Acrobat functionality to do this.
Thanks for any hints!
Chris
October 4th, 2013 at 12:48 pmpermalink
Hello Sir,
Your example above help me a lot to accomplish my project. I able to populate the PDF.
My question is how to rename the data field programmatically, I know how using the Adobe interface.
Example: Field1 is the field I want to rename jsoGetField(“Field1″).Value
Also, how to insert pdf and bookmark?
Thank you and more power to you!
Rey Hernandez
October 23rd, 2013 at 2:15 pmpermalink
Hi KHK,
I’m trying to use your code and I have a PDF here with rather complex setting with different subforms that has similar field names such as:
(These are the field names given when I export the form to a CSV file)
form1[0].Table1[0].Row24[0].TextField71[0]
form1[0].Table1[0].Row24[1].TextField71[0]
My question is….how should I put this into the getField() function?
I’m guessing I would at least need to include the row portion since they determine which row to point to in order to get the field.
Thanks!
Min
November 27th, 2013 at 11:59 ampermalink
This sounds like a XFA or LiveCycle Designer form, and not a simple AcroForm. I’ve never tried to access form fields in such forms from VB. You can try to put the whole name (form1….TextField71[0]) in and hope for the best, but it may not work.
khk
November 27th, 2013 at 12:17 pmpermalink
Regarding my previous post regarding saving pdf files embedded in a Word doc, I got a variant of that code to work. It depends upon Acrobat, and not Reader, being installed and the default for AcroExch (and pdf) documents. I haven’t found a way to test what program is the default. I had both Acrobat and Reader installed on my system (not recommended), asd at the time Reader was the default. Uninstalled now. Here is a subset of the code:
‘=====================================
Sub ExtractEmbeddedDocs()
Dim MyObj As Object
Dim xlApp As Object
Dim xlWkb As Object
Dim myshape As InlineShape
Dim embedObj As OLEObject
Dim FileExtStr As String
Dim FileFormatNum As Long
Dim StrInFold As String, StrOutFold As String
Dim StrDocFile As String, Obj_App As Object, i As Long
Dim StrFile As String, StrFileList As String, StrMediaFile As String, j As Long
Dim outFileName As String
Dim SBar As Boolean
Dim exten As String
Dim embedCount As Integer, wordCount As Integer, excelCount As Integer, visioCount As Integer, pptCount As Integer
Dim pdfCount As Integer
Dim msg As String, temp As String
Dim ok As Boolean
Dim docs As Variant, doc As Variant ‘, temp As Variant
Dim AcroApp As Acrobat.CAcroApp
Dim AcroPDDoc As Acrobat.CAcroPDDoc
Dim AcroAVDoc As Acrobat.CAcroAVDoc
Dim jso As Object
StrInFold = ActiveDocument.Path
If StrInFold = “” Then Exit Sub
‘ Store current Status Bar status, then switch on
SBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
StrOutFold = StrInFold & “\Embedded Files”
Application.ScreenUpdating = False
‘ On Error GoTo error_handler
‘Test for existing output folder, create if they don’t already exist
If Dir(StrOutFold, vbDirectory) = “” Then MkDir StrOutFold
embedCount = ActiveDocument.InlineShapes.Count
‘ This opens the embedded documents, each in their own instance of the program
For Each myshape In ActiveDocument.InlineShapes
If (myshape.Type = wdInlineShapeEmbeddedOLEObject) Then
If (InStr(myshape.OLEFormat.ClassType, “Word”) > 0) Then
‘ handle Word file here
End If
If (InStr(myshape.OLEFormat.ClassType, “Visio”) > 0) Then
‘handle Visio file here
End If
If (InStr(myshape.OLEFormat.ClassType, “Excel”) > 0) Then
‘handle Excel here
End If
If (InStr(myshape.OLEFormat.ClassType, “PowerPoint”) > 0) Then
‘handle Powerpoint here
End If
If (InStr(myshape.OLEFormat.ClassType, “Acro”) > 0) Then
myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ‘ Open the first embedded pdf
myshape.OLEFormat.Activate ‘ probably not needed
‘ GetObject(,”AcroExch.App”) ‘ causes an error
Set AcroApp = CreateObject(“AcroExch.App”)
‘ If AcroApp Is Nothing Then
‘ ‘ Acrobat is not running, create new instance
‘ Set AcroApp = CreateObject(“AcroExch.App”)
‘ End If
Set AcroAVDoc = CreateObject(“AcroExch.AVDoc”)
Set AcroAVDoc = AcroApp.GetActiveDoc ‘ get the logical doc
Set AcroPDDoc = AcroAVDoc.GetPDDoc ‘ get the physical doc
‘ older Word files had ” on each end of the IconLabel of the embedded file, resulting in a path name with “” in it
‘ this was causing an error 4148 when the SaveAs line executed. Go figure 🙂
temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), “”), Chr(34), “”))
outFileName = StrOutFold & “\” + temp ‘
If AcroPDDoc.Save(PDSaveFull, outFileName) = False Then
MsgBox “Cannot save document”
End If
AcroAVDoc.Close (1)
AcroPDDoc.Close
pdfCount = pdfCount + 1
AcroApp.Exit
Set AcroApp = Nothing
Set AcroAVDoc = Nothing
Set AcroPDDoc = Nothing
End If
End If
Next myshape
‘ Clear the Status Bar
Application.StatusBar = False
‘ Restore original Status Bar status
Application.DisplayStatusBar = SBar
Application.ScreenUpdating = True
temp = “Embedded file counts” & vbCrLf & “Total ” & vbTab & vbTab & embedCount & vbCrLf & “Word Files ” & vbTab & wordCount & vbCrLf & _
“Excel Files ” & vbTab & vbTab & excelCount & vbCrLf & “Visio Files ” & vbTab & vbTab & visioCount & vbCrLf & “PowerPoint Files ” & vbTab & pptCount & vbCrLf
temp = temp & “PDF Files ” & vbTab & vbTab & pdfCount & vbCrLf & “Unknown files” & vbTab & embedCount – (wordCount + excelCount + visioCount)
msg = temp
msg = msg & vbCrLf & vbCrLf & “You should have ” & vbTab & (wordCount + excelCount + visioCount + pptCount + pdfCount) & ” files”
MsgBox msg, vbInformation + vbOKOnly
End Sub
‘=====================================
Thanks again for getting me started/pointed in the right direction!
Chris
December 18th, 2013 at 12:08 pmpermalink