4 Mar 2009
Adobe Acrobat and VBA – An Introduction
Update:
Please visit the same post on my business site. The comments are closed here, so if you want to comment, you have to head over to http://khkonsulting.com/2009/03/adobe-acrobat-and-vba-an-introduction/
Here is another topic that comes up every now and then: How can I “talk” to Adobe Acrobat from e.g. MS Excel via VBA? I’ll try to give an introduction into that subject in this document. I will only discuss the basics, but I’m open for suggestions about what part to discuss next. So keep the comments coming.
More after the jump…
The Warning Upfront
Before we get too deep into this, let me say this: I am not a VBA expert. I do not program in VBA or VB. All I know about VB is from googling a few things and looking at sample code. It does help that I’ve programmed in many (make that a capital ‘M’ Many) programming languages, and at the end most of them share enough characteristics that once you know one, you know all of them… But still, don’t consider my VB programs to be at an expert level. I only use the samples to demonstrate general methods. It’s up to you to fill in all the missing details (e.g. exception handling).
Resources
All this information is available in one form or another in Adobe’s SDK documentation. Before you read any further, click on this link and take a look at what they have available.
There are (at least) two documents that are required reading if you want to use Acrobat from within your VBA code:
- Developing Applications Using Interapplication Communication (510KB PDF Document)
- Interapplication Communication API Reference (2.3MB PDF Document)
If you want to utilize the VB/JavaScript bridge, you also should read the JavaScript related documents:
- JavaScript for Acrobat API Reference – Version 8 (7.5MB PDF document)
- Developing Acrobat Applications Using JavaScript (2.7MB PDF document)
All of these documents can also be accessed via Adobe’s online documentation system. In order to find the documents I’ve listed above, you need to expand the tree on the left side of the window for the “JavaScript” and “Acrobat Interapplication Communication” nodes.
There is always more than one way…
There are two ways your program can interact with Acrobat. One is more direct than the other, but both require the same mechanism to get things started…
You can either use the “normal” IAC (Inter Application Communication) interface, which is basically a COM object that your program loads and uses to communicate with Acrobat, or you can use the VB/JavaScript bridge, which allows access to Acrobat’s JavaScript DOM. The latter case still requires that your program first establishes a connection to Acrobat via IAC.
Let’s get the party started
As I mentioned before, regardless of how we want to remote control Adobe Acrobat from VB, we need to establish a connection to it’s COM object (or OLE server). You may have noticed that I always talk about “Adobe Acrobat”, and not the “Adobe Reader”. What I’m presenting here is valid for the Adobe Acrobat, Reader only supports a small subset of features. To learn more about what the differences are, see the IAC Developer Guide. For the purpose of this document, I will use MS Excel 2007 and Adobe Acrobat 9 Pro. As long as you have a version of Acrobat that is compatible with the version of VBA that you are using, you should be able to follow along without any problems.
Preparing MS Excel 2007
When you install Office 2007 or Excel 2007, make sure that you select the Visual Basic Editor component, otherwise you will not be able to write VBA code. This is different than all the versions up to 2007. Once installed, you need to add the “Developer” tab to the ribbon. This is done on the Excel Options dialog, under the Popular category:
Once that is done, you should see the “Developer” tab as part of the ribbon:
Our First Button
Open a new document and select the Developer tab. Then go to the Insert control and place a button on your document. This will pop up the “Assign Macro” dialog, just click on the “Add” button, which will bring up the VBA editor. Nothing special so far.
Before we can use any of Acrobat’s functionality, we need to make sure that VBA knows about the Acrobat objects. On the VBA dialog, select the “Tools>References” menu item. On the dialog that pops up, make sure that the TLB for your version of Acrobat is selected. This is what it looks like for my system:
Now we can add code that references the Acrobat objects to our button handler. Of course, before we do that, we need to decide what our button is actually supposed to trigger. Let’s start with something simple – let’s combine two PDF documents and save the result as a new document.
I’ll present the whole program first, and will then explain the different parts.
Sub Button1_Click() Dim AcroApp As Acrobat.CAcroApp Dim Part1Document As Acrobat.CAcroPDDoc Dim Part2Document As Acrobat.CAcroPDDoc Dim numPages As Integer Set AcroApp = CreateObject("AcroExch.App") Set Part1Document = CreateObject("AcroExch.PDDoc") Set Part2Document = CreateObject("AcroExch.PDDoc") Part1Document.Open ("C:\temp\Part1.pdf") Part2Document.Open ("C:\temp\Part2.pdf") ' Insert the pages of Part2 after the end of Part1 numPages = Part1Document.GetNumPages() If Part1Document.InsertPages(numPages - 1, Part2Document, 0, Part2Document.GetNumPages(), True) = False Then MsgBox "Cannot insert pages" End If If Part1Document.Save(PDSaveFull, "C:\temp\MergedFile.pdf") = False Then MsgBox "Cannot save the modified document" End If Part1Document.Close Part2Document.Close AcroApp.Exit Set AcroApp = Nothing Set Part1Document = Nothing Set Part2Document = Nothing MsgBox "Done" End Sub
Save the document. When prompted for a filename and a filetype, select the type of “Excel Macro-Enabled Workbook” – otherwise the program you just added will get stripped out of the file.
Make sure that there are two files named Part1.pdf and Part2.pdf in the c:\temp directory.
Click the button and enjoy…
After the program is done, there will be a new file C:\Temp\MergedFile.pdf on your disk. Open that in Acrobat, and verify that it indeed contains the results of concatenating the two source files.
So, how does it work?
The whole program is in a button handler.
Sub Button1_Click() ... End Sub
Let’s now look at the different parts of that handler.
At first, we need to setup a bunch of objects that we will use further down the code:
Dim AcroApp As Acrobat.CAcroApp Dim Part1Document As Acrobat.CAcroPDDoc Dim Part2Document As Acrobat.CAcroPDDoc Dim numPages As Integer
The first statement sets up an object of type Acrobat.CAcroApp – this reflects the whole Acrobat application. If you look through the documentation, you’ll see that there are a number of things that can be done on the application level (e.g. minimizing or maximizing the window, executing menu items, retrieve preference settings, closing the application, …). The next two lines declare two objects of type Acrobat.CAcroPDDoc – these reflect the two documents that we need to open.
There are two different document types available in the OLE part of IAC: The AVDoc and the PDDoc. An AVDoc is one that gets opened in Acrobat’s user interface, the user can navigate through it’s pages, and do anything that you can do with a PDF document when you double-click on it to open it in Acrobat. A PDDoc on the other hand gets opened in the background. Acrobat still has access to it, and can manipulate it, but the user does not see it. This is useful if a program should quietly do it’s work without showing the user what’s going on.
Every AVDoc has a PDDoc behind the scenes, and that object can be retrieved via the AVDoc.GetPDDoc method. A PDDoc only has an associated AVDoc if it is actually shown in Acrobat, however, we cannot retrieve that AVDoc object from within the PDDoc. This sounds complicated, but once you get more familiar with how these things are used, it becomes second nature.
We also need an integer object to store the number of pages in the first document.
Set AcroApp = CreateObject("AcroExch.App") Set Part1Document = CreateObject("AcroExch.PDDoc") Set Part2Document = CreateObject("AcroExch.PDDoc")
In the next step, we initialize the three Acrobat related objects. Nothing special here.
Part1Document.Open ("C:\temp\Part1.pdf") Part2Document.Open ("C:\temp\Part2.pdf")
Now that our objects are initialized, we can use the methods to do something with the objects. In order to merge files, we need access to both the source files, so we have to call the Open() method on both these objects. The key to success is to specify the whole path name, directory and filename.
numPages = Part1Document.GetNumPages()
The method InsertPages requires that we specify after which page to insert the second document. Because we want to insert the pages after the last page of the first document, we need to find out how many pages we have in that document. The GetNumPages() method does return that information.
This is also, where it becomes a bit tricky: Acrobat starts to count the pages in a PDF document at zero. So, if we want to insert the pages after the first page in the document, we need to insert after page number zero. If we want to insert after the second page, we need to insert after page number one… Because we want to insert the pages after the last page of the first document, we need to insert the pages after (lastPage-1). Again, this is a bit confusing, but after a while it gets easier.
If Part1Document.InsertPages(numPages - 1, Part2Document, 0, Part2Document.GetNumPages(), True) = False Then MsgBox "Cannot insert pages" End If
This is where Acrobat does all it’s work. The parameters of the InsertPages method are described in the Interapplication Communication API Reference document: InsertPages
Now we only have to save the document, do some cleanup and exit our program:
If Part1Document.Save(PDSaveFull, "C:\temp\MergedFile.pdf") = False Then MsgBox "Cannot save the modified document" End If Part1Document.Close Part2Document.Close AcroApp.Exit Set AcroApp = Nothing Set Part1Document = Nothing Set Part2Document = Nothing MsgBox "Done"
With these steps, and the information in the API documentation, you should be able to write simple programs.
I’ll document the VB/JavaScript bridge in my next posting.
To explain more. I want to automate the creation of a PDF from an existing excel file and add signature fields. This is doable but the final PDF cannot be signed in reader it needs to be saved with extended features.
Sami
October 19th, 2013 at 4:04 pmpermalink
I have have unique issue. The code works fantastic on most of my computers, but a couple give an error message when running the save. The err.description is “Type Mismatch” The laptops are running MS Access 2007 on XP machines. I had six machines loaded with the same image. four work fine, two stop at the same place. Here is where it stops:
If myMainDocument.Save(PDSaveFull, MyPath & MyPDF & “.pdf”) = False Then
MsgBox “Cannot save the modified document”
End If
Thank you for the great code!!!
Michael
October 25th, 2013 at 10:43 ampermalink
[…] che tu non fai: MergePDF – Page 2 questo: Acrobat SDK, HowTo insert pages inside pdf oppure questo: Adobe Acrobat and VBA – An Introduction | Karl Heinz Kremer's Ramblings questo: VBA Merging PDF's (Acrobat 9.0) – Access World Forums Spero che uno di questi possa […]
Collegarmi ad Adobe Acrobat 5.0 con access 2003 - Access
October 28th, 2013 at 8:46 ampermalink
Hello, good to see this! It helped me A LOT.
But I want to ask that, how should I write code in VBA in order to combine diffrent types of files into one pdf? Should I firstly convert every single one of them into pdf first? Or I can do it just in one shot?
If so, How can I write the code? I mean, if there any function I can use to create a pdf from any kind of files?
BTW, I have a well-licensed Acrobat. Many Thanks!!!!
Xueying
October 31st, 2013 at 6:42 ampermalink
How should I write the code on the InsertPages section to insert ONLY page two of the Part2Document after the last page of Part1Document?
I tried messing with the parameters but it keeps pulling the entire Part2Document document.
If Part1Document.InsertPages(numPages – 1, Part2Document,
0, Part2Document.GetNumPages(), True) = False Then
MsgBox “Cannot insert pages”
End If
Jim
November 13th, 2013 at 5:01 pmpermalink
Seems to me there are two ways you can achieve this:
1. Insert the entire Part2Document into Part1Document, and then delete the Part2Document pages you don’t want, or;
2. Extract page 2 of Part2Document, save it to a temporary file, insert that file into Part1Document, then delete the temporary file.
Martin Petrey
November 14th, 2013 at 4:04 pmpermalink
[…] […]
Metadati da PDF a tabella Excel - Excel
December 17th, 2013 at 1:09 pmpermalink
[…] Adobe Acrobat and VBA – An Introduction | Karl Heinz Kremer's Ramblings I'd play with it on a blank database until you get it running. It locked up my computer for a bit. __________________ Version: Access 2010 […]
Combine separate pdf's into one - dBforums
January 12th, 2014 at 10:37 pmpermalink
Jim,
I’ve found Karl’s info and the subsequent discussion quite helpful. I haven’t used any of it yet (so beware of my advice), but below is a suggestion to insert ONLY page two of the Part2Document after the last page of Part1Document.
Good luck and thank you everybody, especially Karl.
Jim FM
If Part1Document.InsertPages(numPages – 1, Part2Document,
1, 1, True) = False Then
MsgBox “Cannot insert pages”
End If
Jim FM
January 25th, 2014 at 3:27 pmpermalink
This is very interesting. Do you happen to have any code that would add a background to a pdf file? I actually need to remove the white border around a PDF that I saved from Excel. I can do this by setting the background color to black in Acrobat – but I need to automate the process. Can you help with some code to get me started?
Steve
February 10th, 2014 at 5:56 pmpermalink
Steve, I don’t think this can be done with VB or VBA. You don’t have access to the PDF content via the IAC (or the JavaScript) interface. Only plug-ins can do that. And that’s where it gets complicated 🙂
khk
February 10th, 2014 at 6:41 pmpermalink