Mar 16 2020

documents4j for PDF output

Generating high fidelity PDF output from Office documents has always been a challenge, given the “long tail” of features which are possible in docx/pptx/xlsx files.

For Word documents, it is easy enough to output paragraphs of text, tables and images. But add in VML, DrawingML, equations, SmartArt, and fidelity becomes a challenge.

If your documents are constrained, you may be able to find a suitable conversion tool. Plutext’s PDF Converter was a good example of this. It worked well on a growing range of documents.

But ultimately, if you want great fidelity on a unconstrained set of files, you need to be using Microsoft’s own Office layout engine.

There are various ways to do that, for example https://developer.microsoft.com/en-us/graph/examples/document-conversion

For Java developers, a good solution is documents4j.

It uses Office and the Microsoft Scripting Host for VBS on the conversion machine, so that machine must Microsoft Windows.

Documents4j can run either a “LocalConverter” or a “RemoteConverter”.

Using a LocalConverter is as simple as:

import java.io.File;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class ToPDF {

	public static void main(String[] args) {

		File wordFile = new File( System.getProperty("user.dir")+"/input.docx" ); 
		File target = new File( System.getProperty("user.dir")+"/output.pdf" );
		
		IConverter converter = LocalConverter.builder()
                .baseFolder(new File("C:\\temp"))
                .workerPool(20, 25, 2, TimeUnit.SECONDS)
                .processTimeout(30, TimeUnit.SECONDS)
                .build();		
                
       Future<Boolean> conversion = converter
                                .convert(wordFile).as(DocumentType.MS_WORD)
                                .to(target).as(DocumentType.PDF)
                                .prioritizeWith(1000) // optional
                                .schedule();
               
	}

}

From Maven, you just need these dependencies:

		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-local</artifactId>
			<version>1.1.1</version>
		</dependency>

		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-transformer-msoffice-word</artifactId>
			<version>1.1.1</version>
		</dependency>

		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-simple</artifactId>
		</dependency>

For a successful conversion, your logs will contain:

[main] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - From-Microsoft-Word-Converter was started successfully
[main] INFO com.documents4j.job.LocalConverter - The documents4j local converter has started successfully
[pool-1-thread-1] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - Requested conversion from input.docx (application/vnd.com.documents4j.any-msword) to output.pdf (application/pdf)

No Responses so far

Comments are closed.

Comment RSS