Archive for March, 2020

documents4j for TOC update

March 16th, 2020 by Jason

documents4j can also be used to update the TOC page numbers in your docx file.

For this, there are 2 adjustments to our previous post.

The first is that you convert to(target).as(DocumentType.DOCX), not DocumentType.PDF, so you get docx output.

The second is that you need a customised word_convert.vbs containing, for example:

    ' Update TOC
    wordDocument.TablesOfContents(1).UpdatePageNumbers

This code will update the first TablesOfContents.

See further https://github.com/documents4j/documents4j/blob/master/documents4j-transformer-msoffice/documents4j-transformer-msoffice-word/src/main/resources/word_convert.vbs

word_convert.vbs is typically found in your documents4j-transformer-msoffice-word.jar

documents4j for PDF output

March 16th, 2020 by Jason

Generating high fidelity PDF output from Office documents has always been a challenge, given the “long tail” of features which are possible in docx/pptx/xlsx files.

For Word documents, it is easy enough to output paragraphs of text, tables and images. But add in VML, DrawingML, equations, SmartArt, and fidelity becomes a challenge.

If your documents are constrained, you may be able to find a suitable conversion tool. Plutext’s PDF Converter was a good example of this. It worked well on a growing range of documents.

But ultimately, if you want great fidelity on a unconstrained set of files, you need to be using Microsoft’s own Office layout engine.

There are various ways to do that, for example https://developer.microsoft.com/en-us/graph/examples/document-conversion

For Java developers, a good solution is documents4j.

It uses Office and the Microsoft Scripting Host for VBS on the conversion machine, so that machine must Microsoft Windows.

Documents4j can run either a “LocalConverter” or a “RemoteConverter”.

Using a LocalConverter is as simple as:

import java.io.File;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class ToPDF {

	public static void main(String[] args) {

		File wordFile = new File( System.getProperty("user.dir")+"/input.docx" ); 
		File target = new File( System.getProperty("user.dir")+"/output.pdf" );
		
		IConverter converter = LocalConverter.builder()
                .baseFolder(new File("C:\\temp"))
                .workerPool(20, 25, 2, TimeUnit.SECONDS)
                .processTimeout(30, TimeUnit.SECONDS)
                .build();		
                
       Future<Boolean> conversion = converter
                                .convert(wordFile).as(DocumentType.MS_WORD)
                                .to(target).as(DocumentType.PDF)
                                .prioritizeWith(1000) // optional
                                .schedule();
               
	}

}

From Maven, you just need these dependencies:

		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-local</artifactId>
			<version>1.1.1</version>
		</dependency>

		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-transformer-msoffice-word</artifactId>
			<version>1.1.1</version>
		</dependency>

		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-simple</artifactId>
		</dependency>

For a successful conversion, your logs will contain:

[main] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - From-Microsoft-Word-Converter was started successfully
[main] INFO com.documents4j.job.LocalConverter - The documents4j local converter has started successfully
[pool-1-thread-1] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - Requested conversion from input.docx (application/vnd.com.documents4j.any-msword) to output.pdf (application/pdf)