docx4java aka docx4j – OpenXML office documents in Java » 2020

documents4j for TOC update

March 16th, 2020 by Jason

documents4j can also be used to update the TOC page numbers in your docx file.

For this, there are 2 adjustments to our previous post.

The first is that you convert to(target).as(DocumentType.DOCX), not DocumentType.PDF, so you get docx output.

The second is that you need a customised word_convert.vbs containing, for example:

    ' Update TOC
    wordDocument.TablesOfContents(1).UpdatePageNumbers

This code will update the first TablesOfContents.

See further https://github.com/documents4j/documents4j/blob/master/documents4j-transformer-msoffice/documents4j-transformer-msoffice-word/src/main/resources/word_convert.vbs

word_convert.vbs is typically found in your documents4j-transformer-msoffice-word.jar

Posted in Uncategorized | Comments Off on documents4j for TOC update

documents4j for PDF output

March 16th, 2020 by Jason

Generating high fidelity PDF output from Office documents has always been a challenge, given the “long tail” of features which are possible in docx/pptx/xlsx files.

For Word documents, it is easy enough to output paragraphs of text, tables and images. But add in VML, DrawingML, equations, SmartArt, and fidelity becomes a challenge.

If your documents are constrained, you may be able to find a suitable conversion tool. Plutext’s PDF Converter was a good example of this. It worked well on a growing range of documents.

But ultimately, if you want great fidelity on a unconstrained set of files, you need to be using Microsoft’s own Office layout engine.

There are various ways to do that, for example https://developer.microsoft.com/en-us/graph/examples/document-conversion

For Java developers, a good solution is documents4j.

It uses Office and the Microsoft Scripting Host for VBS on the conversion machine, so that machine must Microsoft Windows.

Documents4j can run either a “LocalConverter” or a “RemoteConverter”.

Using a LocalConverter is as simple as:

import java.io.File;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class ToPDF {

	public static void main(String[] args) {

		File wordFile = new File( System.getProperty("user.dir")+"/input.docx" ); 
		File target = new File( System.getProperty("user.dir")+"/output.pdf" );
		
		IConverter converter = LocalConverter.builder()
                .baseFolder(new File("C:\\temp"))
                .workerPool(20, 25, 2, TimeUnit.SECONDS)
                .processTimeout(30, TimeUnit.SECONDS)
                .build();		
                
       Future<Boolean> conversion = converter
                                .convert(wordFile).as(DocumentType.MS_WORD)
                                .to(target).as(DocumentType.PDF)
                                .prioritizeWith(1000) // optional
                                .schedule();
               
	}

}

From Maven, you just need these dependencies:

		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-local</artifactId>
			<version>1.1.1</version>
		</dependency>

		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-transformer-msoffice-word</artifactId>
			<version>1.1.1</version>
		</dependency>

		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-simple</artifactId>
		</dependency>

For a successful conversion, your logs will contain:

[main] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - From-Microsoft-Word-Converter was started successfully
[main] INFO com.documents4j.job.LocalConverter - The documents4j local converter has started successfully
[pool-1-thread-1] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - Requested conversion from input.docx (application/vnd.com.documents4j.any-msword) to output.pdf (application/pdf)

Posted in Uncategorized | Comments Off on documents4j for PDF output

Archive for March, 2020

documents4j for TOC update

documents4j for PDF output

Subscribe

Recent Posts

Pages

Categories

Archives