Referencing pages of a multi-page PDF file during data merge… workaround

At the time of writing, there are three multi-page/artboard file formats that Adobe InDesign can import when placing a file via the File/Place function. These formats are:

  • PDF
  • Adobe Illustrator
  • Adobe InDesign

(While it is possible to create many artboards in Adobe Photoshop, it is not possible to import a specific Photoshop artboard into Adobe InDesign… – at the time of writing that is – but that is another article!)

When placing one of these three formats, it is possible to control several import functions using the show import dialog box, such as:

  • Which page (or pages) to import;
  • How the pages should be cropped;
  • Whether or not to place the pages with a transparent background; and
  • What layers to show and their visibility;

However, when importing these file types as variable images during a data merge, these options are unavailable and replaced with the following:

  • Only the first absolute page of the file is imported (not always the page numbered 1 as the first page can also be – for example – in roman numerals or start at a page other than one); and
  • Page cropping, transparency and layer visibility is determined by the same variables as the last file of that type to be placed into the artwork.

For now, there is no workaround to control the latter issues during a data merge, other than to be familiar with this behaviour and plan the merge accordingly. There is a workaround for importing pages beyond the first page of a PDF file… but not an Illustrator or InDesign file.

Workaround: Split the PDF

The term “workaround” is used loosely in this context. Unfortunately, the solution is to break the PDFs into single page records. This can be done within Acrobat using the split button from the organise pages panel.

This feature also allows multiple files to be split at once.

By default, the resulting files will maintain the same filename with the addition of _Partx prior to the filename, with x representing the absolute page number.

Otherwise, I’ve prepared an action that you can download here that will save the PDFs to the Documents folder of the machine running the action.

(Yes, I’m also aware that there are quite literally hundreds of websites out there that will split multi-page PDFs to single PDFs for free. However, the methods outlined above will do so without involving a third party).

The next part of the workaround involves the data itself, and I’ll be using Microsoft Excel to create formulas to make the numbering for the resulting pages. All variable images being referenced will also be in the same folder as the data file, meaning only the filename is required and not the full path and the filename.

For data where the page number is known

Add a column to the database that references the absolute PDF page number that needs to be imported.

Absolute vs Section numbers abridged:

Absolute numbers refers to a page number based on the total count of pages in the document, while section numbers refers to the page number that was applied using page numbering in the application that made the PDF.

For example, take a PDF that contains 20 pages with the first six pages being in roman numerals, and the remainder being in decimal numbers. These two different styles of numbering are section numbers, while absolute page numbers refer to the total count of pages. To reference page iv of the PDF, the absolute page number to reference is 4. To reference page 5 of the PDF, the absolute page number reference is 11.

In this example, the A column represents the PDF to reference, the B column represents the absolute page number, and C represents the result. To obtain this result, the following formula can be used:

=SUBSTITUTE(A2,".PDF","_Part"&B2&".pdf")

This formula will look at filename reference and substitute the .PDF portion of the filename for _Partx.pdf, where x represents the figure in the B column. Using this formula, only filenames with the PDF extension will be affected, while filenames in other formats will be unaffected.

For data where the page reference needs to increment by one more than the row above

The same formula can be used for the naming, but another formula is used to determine if the page reference should increase if the same base file is being referenced in the row directly above.

In this example, the N column represents the PDF to reference, the O column represents the absolute page number, and P represents the result. A 24 page file NS91912 is being merged and needs to have the page reference incremented by one so that the filenames are NS91912_Part1.pdf to NS91912_Part24.pdf. The following formula can be used to change the page reference:

=IF(N2=N1,O1+1,1)

This formula will look at the filename and determine that if the filename is different to the row above, put the number 1 in the cell, BUT if the filename is the same as the row above, take the page value from the cell above and add 1 to it into this cell.

In a perfect world

Again, this is a workaround – it will only work for PDFs and requires some upfront work to prepare. Ideally, if I had my way and could implement some improvements, I’d like to see:

  • Not just the ability to choose a specific page, but choose the correct trim box and layers as well. For example, a file reference such as myFile.pdf;1,trim;Layer1,Layer2 where 1 represents the absolute page number, trim represents what trim box to use, and Layer1,Layer2 represent the layers I would like to appear (or leave the layer bit blank if all layers should be visible).
  • The ability to perform a similar task for incoming INDD, AI or PSD files.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.