source: trunk/docx4j/README.txt @ 1687

Revision 1687, 21.9 KB checked in by jharrop, 4 months ago (diff)

Final README for 2.7.1

Line 
1==============================================================================
2  DOCX4J  -  README
3==============================================================================
4
5Contents of this file:
6 - What is docx4j?
7 - Where do I get it?
8 - How do I get started?
9 - Where to get help?
10 - How do I build docx4j?
11 - Legal Information
12 - Release Notes
13
14==============================================================================
15
16
17What is docx4j?
18---------------
19
20docx4j is an open source (Apache v2) library for creating, editing, and saving OpenXML "packages",
21including docx, pptx, and xslx.
22
23It uses JAXB to create the Java representation.
24
25- Open existing docx/pptx/xlsx (from filesystem, SMB/CIFS, WebDAV using VFS)
26- Create new docx
27- Programmatically manipulate the docx document (of course)
28- CustomXML binding (with OpenDoPE extensions for repeats & conditionals)
29- Export as HTML or PDF
30- Diff/compare documents, paragraphs or sdt (content controls)
31- Import a binary doc (uses Apache POI's HWPF)
32- Produce/consume Word 2007's xmlPackage (pkg) format
33- Save docx to filesystem as a docx (ie zipped), or to JCR (unzipped)
34- Apply transforms, including common filters
35- Font support (font substitution, and use of any fonts embedded in the document)
36
37
38Where do I get it?
39------------------
40
41http://dev.plutext.org/downloads.html
42
43
44How do I get started?
45------------------
46
47See the Getting Started guide.
48
49
50Where to get help?
51------------------
52
53http://dev.plutext.org/forums
54
55
56How do I build docx4j?
57----------------------
58
59Get it from svn, at http://dev.plutext.org/svn/docx4j/trunk
60
61If you are using eclipse and m2eclipse plugin, enable dependency management.
62
63For more details, see the Getting Started guide.
64
65
66
67Legal Information
68-----------------
69
70docx4j is published under the Apache License version 2.0. For the license
71text, please see the following files in the legals directory:
72- LICENSE
73- NOTICE
74
75Legal information on libraries used by docx4j can be found in the
76"legals/NOTICE" file.
77
78Here is a (TODO: non exhaustive?) list of files included in docx4j but not published under Apache
79License version 2.0:
80
81- DocX2Html.xslt (though docx2xhtmlNG2.xslt is our supported transform, not that)
82- src/diffx (ARTISTIC LICENCE)
83- xsd/**
84
85
86
87==============================================================================
88  RELEASE NOTES
89==============================================================================
90
91Version 2.7.1
92=============
93
94r1601-
95
96Release date
97------------
98
9914 October 2011
100
101Contributors to this release
102----------------------------
103
104Albert Aymerich
105alberto
106Antoine
107Jason Harrop
108y.rolland
109
110Notable Changes in Version 2.7.1
111---------------------------------
112
113Preparation for including docx4j in Maven Central
114
115[1605-1610] mc:AlternateContent preprocessor, allowing graceful degradation of Word 2010 specific content
116
117[1604] docx4j.properties, supports configuration of default page size, margins, orientation; also ability to set some of the doc props metadata (Application & AppVersion; dc.creator & dc.lastModifiedBy).
118
119[1631, 1637] HtmlExporterNG2,(Pdf)Conversion, SvgExporter: storing any images is delegated to a
120ConversionImageHandler that may be passed as a conversion parameter. Default implementation: DefaultConversionImageHandler
121
122VFS stuff moved to docx4j-extras
123
124
125OpenDoPE changes
126----------------
127
128[1639] Change static OpenDoPEHandler design to instance-based design, with objective of making it thread-safe.
129
130[1645] When binding, create hyperlinks out of text containing http://
131
132[1653] Handle unwrapping correctly in ShallowTraversor, so JAXBElements stay wrapped, and we don't risk a marshalling exception for any which don't have an @XmlRootElement annotation.
133
134[1658] Word can only resolve an XPath binding which results in an element (as opposed to a boolean, integer, string or node-set). OpenDoPE processing can handle these other results types (some of them anyway).
135Up until now, that processing was done in OpenDoPEHandler. Now it is done in BindingHandler and bind.xslt, for consistency with how normal Word XPath bindings are handled by docx4j.
136
137[1662] Bind picture correctly where parent is another content control.
138
139Other Changes (non-exhaustive)
140------------------------------
141
142[1613] Header and footer parts use XPath binder
143
144[1679-80] create image part directly from file
145
146
147Version 2.7.0
148=============
149
150Release date
151------------
152
1538 July 2011
154
155Contributors to this release
156----------------------------
157
158alberto
159amdonov
160azeloro
161Dave Brown
162Jason Harrop
163Marcel
164Patrick Linskey
165ppa_waw
166Richard
167Tinne
168
169
170Notable Changes in Version 2.7.0
171---------------------------------
172
173Improvements to Maven build
174
175ContentAccessor interface
176
177AlteredParts: identify parts in this pkg which are new or altered; Patcher
178which adds new or altered parts.
179
180Support for .glox SmartArt package (/src/glox/)
181
182JAXB RI 2.2.3 compatibilty
183
184OpenDoPE improvements (see below)
185
186xlsx4j
187------
188
189[1455] Support for Spreadsheet Comments.
190
191[1494] Detect /xl/workbook.xml as WorkbookPart, rather than DefaultXmlPart. 
192Add convenience method getWorkbookPart
193
194pptx4j
195------
196
197[1539] Better support for slide size.
198
199[1549] Convenience method to get MainPresentationPart
200
201OpenDoPE changes
202----------------
203
204[1339] OpenDoPEHandler: Pre-processing step evaluates an od:xpath which doesn't have a corresponding w:databinding. 
205This is designed to handle an XPath expression which evaluates to a boolean or number, rather than a node.
206
207[1389] Generalise applyBindings, so that it should work on not just a DocumentPart, but also a header or footer part.
208
209[1390] Act on databinding for content controls of type picture.
210
211[1423] Support w:multiLine data binding.
212
213[1441] scale image based on content control size
214
215[1449] Handle a databinding which points to Core or Extended Properties, or CoverPage props.
216
217[1453] Traverse into text box.  Handle content control in text box. 
218
219[1506] Process Header and Footer parts as well.
220
221[1547] Tinne's patch of 20 June, which takes the Jan-Willem van den Broeks XPath grammar and builds it into a rewriting parser that enhances xpath expressions just the way that is needed. Thus, all xpath 1.0 expressions can be used.
222
223
224Other Changes (non-exhaustive)
225------------------------------
226
227Various PDF & HTML output improvements
228
229Tuning of log levels; removal of some System.out.println
230
231[1333] Fix for image part naming.
232
233[1344] UnitsOfMeasurement, fix for Germany, where they use a comma as decimal separator instead of a point.  Solves issue in FOP
234
235[1352] Rework header/footer model to take account of "same as previous" and whether first page header/footer is active or not.
236
237[1356] Code cleanup: remove old approaches to HTML generation.
238
239[1358] Allow for user-defined handlers to prepare HTML output depending on the value of an sdt tag.
240
241[1396] docx4j is not dependent on Xerces (other than in XmlPart), but Websphere (presumably using IBM JDK) doesn't have Sun's Xerces implementation, so use real Xerces if it is on the class path
242
243[1412] Add XPath support to header part.
244
245[1416] coarser grained ways to tokenize text when
246diffx turns XML into a stream of events. The current diffx stuff creates
247a token for every word, and on large documents, the diff algorithms become
248unwieldy in terms of memory usage/time. Coarser text splitting makes fewer
249events.
250
251[1432] interface to getSdtPr
252[1437] SdtElement interface; CTCustomXmlElement interface
253
254[1461] VML generated classes, based on ECMA 376 1ed (rather than earlier draft). 
255
256[1470] Make docx4j compatible with JAXB RI 2.2.3 unmarshalling
257
258[1479] extension to TraversalUtil, which allows you to define the tag you are interested in as a generic of the visitor class.
259
260[1480] StyleUtil: styles areEqual, isEmpty, apply
261
262[1481] Bugfix: Handle internal HYPERLINK
263
264[1487] MetafilePart to extend BPAI, so WMF images can be added.
265
266[1492] Support for http://schemas.openxmlformats.org/officeDocument/2006/bibliography
267
268[1536] Support for common paper sizes.
269
270[1537] Knowledge of "well known" margin settings.
271
272[1556] Native support for bitmap (bmp) images
273
274[1569] Configure log4j automatically if necessary; paves the way for all System.out.println to be removed.
275
276
277Version 2.6.0
278=============
279
280Release date
281------------
282
28318 Nov 2010
284
285Contributors to this release
286----------------------------
287
288Jason Harrop
289
290
291Major Changes in Version 2.6.0
292------------------------------
293
294OpenDoPE ("Open Document Processing Ecosystem") v2.2 implementation for generating documents using repeats, conditionals and component inclusion.
295Implementation now lives in model/datastorage package.
296
297TraversalUtil class, which makes it easy to find things in the main document part (an alternative to XPath), and optionally, do something to them
298
299Dependency cleanup, now uses FOP 1.0, and standard Xalan 2.7.1
300
301Other Changes (non-exhaustive)
302------------------------------
303
304[1177] PDF output: set margins in layout masters; make room in region body margins for header & footer; set header & footer extents manually
305[1182] Support for page number field <w:fldSimple w:instr=" PAGE   \* MERGEFORMAT ">
306[1196] Ensure docx4j can be built using either ant or maven, with only one of the JAXB implementations (Java 6 or RI) required
307[1212] Basic support for paragraph shading and borders in PDF and HTML output.
308[1217] Bug fix: image part naming
309[1220] If there is w:pPr/w:pStyle, we must honour any rPr in the pStyle (reinstate code commented out months ago)
310[1231] Use official xmlgraphics-commons-1.4
311[1232] Make it possible to run certain samples from the command line.
312[1234] Use FOP 1.0.; Include source code for fop-fonts, as org.docx4j.fonts.fop. Move panose to org.docx4j.fonts.foray
313[1235] Include @sub-font in FOP config; this is required for TTC
314[1238] Use standard Xalan 2.7.1 instead of our patched version; remove references to DTMNodeProxy
315[1262] Use style0 as default para style for docx from OO
316[1270] Support full justification in XSL FO
317[1273] HTML output: when test="contains(./w:sdtPr/w:tag/@w:val, '@class=collapse')" allow the sdt to Collapse.
318[1295] Methods to check whether partname is already in use.
319[1302] Pass Relationship to newPartForContentType so AlternativeFormatInputPart can be detected.
320[1306] EmbeddedPackagePart
321[1307] OpcPackage: don't create props parts, merely because user has asked for one. CorePart: set JAXB context correctly. Rels part: relId generation altered
322
323
324pptx4j changes
325--------------
326
327[1179] Basic support for images in pptx svg in html output
328[1180] Alter slide to html/svg api, to make it more obvious you are processing a slide, and doing so one at a time.
329Only show text box with a red dash border if debug level logging is enabled.
330[1185] Support Word 2003 page numbers in PDF output. ie <w:fldChar w:fldCharType="begin"/> <w:instrText xml:space="preserve">PAGE  </w:instrText>
331<w:fldChar w:fldCharType="end"/>
332[1198] Method for creating a slide; don't do that when creating package.
333
334
335Version 2.5.0
336=============
337
338Release date
339------------
340
34115 July 2010
342
343Contributors to this release
344----------------------------
345
346Jason Harrop
347
348Major Changes in Version 2.5.0
349------------------------------
350
351[1152] XPath query which returns live JAXB objects
352[1158] Content control pre-processing for conditionals, repeats.
353[1167/8] PDF conversion via HTML or iText moved from main source tree;
354       iText, xhtmlrenderer and pdf-renderer dependencies removed.
355
356Other Changes
357-------------
358
359[1152] Content control data binding xpath namespace stuff integrated into NamespacePrefixMappings.
360[1164] SaveToZip: .xml extension implies save as Flat OPC instead
361[1164] XmlUtils.unwrap
362
363Version 2.4.0
364=============
365
366Release date
367------------
368
3699 July 2010
370
371Contributors to this release
372----------------------------
373
374Jason Harrop
375
376Major Changes in Version 2.4.0
377------------------------------
378
379[1135] PDF via XSL FO: header/footer support for more than 1 section
380[1134] JAXB representation of XSL FO
381
382Other Changes
383-------------
384
385[1140] Try harder to delete add image temp file
386[1139] HTML, PDF: highlight wholly unimplemented features, only if debug-level logging is enabled.
387[1131] Enhancements to XSL FO output
388[1130] Support for .dotx and .dotm
389[1129] Fix instances of "Two classes have the same XML type name -- Use .. @XmlType namespace to assign different names to them."
390[1127] PDF output: Handle images in headers/footers
391[1126] MetafileEmfPart now extends BinaryPartAbstractImage, so EMF images can be added to the docx.
392[1125] Add @XmlRootElement to CT_MarkupRange, indicating it is bookmarkEnd.
393XSD: create new types CT_MoveFrom|ToRangeEnd, so elements moveFrom|ToRangeEnd don't get confused with bookmarkEnd. TODO: havn't run xjc on this new xsd.
394[1121] Support for 4 SmartArt parts.
395[1120] XML parts which we don't specifically know how to handle: load these as xml parts (previously they were loaded as binary parts).
396[1107] When creating image part names in BPAI, use the generated relId as the image name.
397[1102] Support for ActiveX parts.  Previously the Xml part was being represented as binary, and hence encoded in output.
398[1098] Support image of type anchor, not just inline.
399[1078] Support for WMF (but not EMF, yet) as SVG in HTML output.
400
401pptx4j changes
402--------------
403
404[1088] SVG output: Paragraphs of large text in a box with a border, need a reduced top-margin.
405[1087] Basic character formatting in SVG output
406[1085] Convert line to SVG
407[1083] JAXB representation of SVG 1.1
408
409
410Version 2.3.0
411=============
412
413Release date
414------------
415
41617 Feb 2010
417
418
419Contributors to this release
420----------------------------
421
422Jason Harrop
423Holger Schlegel
424
425Major Changes in Version 2.3.0
426------------------------------
427
428[1044] pptx4j
429[1041] More complete DML, generated from TC45 1.0 final, using dml__ROOT.xsd
430[ 956] Basic implementation of styled tables in xsl fo.  More work needed on border conflict resolution.
431[ 949] Table styles in HTML NG2 output; borders, shading, vertical alignment
432[ 943] Added DocumentModel. DocumentModel is a list of SectionWrappers; a SectionWrapper has a HeaderFooterPolicy, PageDimensions and sectPr.
433HeaderFooterPolicy moved to new package, as there will be 1 per SectionWrapper.
434[ 923] introduce model/Property, to handle property conversion to CSS, and to XSL FO, more cleanly.
435Adds conversion from CSS.     
436[ 912] HtmlExporterNG2, which uses new StyleTree to take advantage of CSS cascade/priority rules to apply effective styles.
437
438Other Changes
439-------------
440
441[1050] Renamed Package -> OpcPackage
442[1039] Original dml-* from EcmaTC45 OOXML v1.0 final
443[1036] Original pml-* from EcmaTC45 OOXML v1.0 final
444[1024] Footnotes in PDF via XSL FO.
445[1015] Support for footnotes and endnotes in HTML.
446[1008] added docs/Docx4j_GettingStarted
447[1003] Remove dom4j stuff
448[ 997] Basic support for list indentation in PDF via XSL FO
449[ 990] Updated fop jar to include support for wingdings and other TrueType fonts with symbol character maps (patched with fop r891181 of 20091216)
450[ 983] Support for adding linked (as opposed to embedded) images.
451[ 979] Basic support in pdf via XSL FO, and HTML NG2, for bookmarks, hyperlink, symbols, w:pict.
452[ 977] PDF via XSL FO: basic support for paragraph numbering
453[ 975] JCR: Methods to get content as string (workaround for ALFCOM-3049)
454[ 974] Handle w:t[@xml:space='preserve'] in NG2
455[ 962] Example: CopyPart.
456[ 962] New method setPartName(PartName newName), which is useful if you want to rename a part.
457[ 960] Mechanism for passing state during the conversion process
458[ 955] altChunk
459[ 932] DocPropsCustomPart: When setting property, overwrite existing property with same name.
460[ 930] Converter infrastructure can be used for incoming conversions (eg HTML table to w:tbl)
461[ 928] Model interface: remove Converter arg from build method
462[ 925] Regenerated classes from wml.xsd, having added EG_MathContent back in to EG_RunLevelElements
463[ 924] New method Context.getWmlObjectFactory(); we only need one instance of the ObjectFactory..
464[ 922] new UnitsOfMeasurement class
465[ 909] LoadFromZipFile can conserve memory by not loading the contents of binary parts
466[ 905] Modify load method to also support loading a Flat OPC .xml file
467[ 903] Bug fix in revised deepCopy method: use JAXBContext parameter properly
468
469
470Version 2.2.2
471=============
472
473Release date
474------------
475
47617 Sept 2009
477
478
479Contributors to this release
480----------------------------
481
482Jason Harrop
483Holger Schlegel
484
485
486Major Changes in Version 2.2.2
487------------------------------
488     
489[888] Generate classes from shared-math.xsd
490
491[885] JAXB representation for VML (eg as used when a document containing embedded images is
492      saved as docx from Word 2003).     
493
494
495Other Changes
496-------------
497
498[895] There are no dom4j parts anymore.  Parts which aren't JAXB XML parts now extend new XmlPart,
499      which uses JAXP instead of dom4j.  The use of dom4j is deprecated, and all references to it
500      will be removed in docx4j v3.
501
502[894] Explicitly specify class loader when loading JAXBContext. Prevents versions of JBOSS from
503      trying to use a different class loader.
504     
505[893] Replace deepCopy methods with Holger's contribution of 9 Sept.
506
507[887] Apply Holger Schlegel's patch adding a generic parameter for the JaxbElement property.     
508
509[886] SaveToJCR will create folders from path segments as required (at least for Alfresco;
510      for other implementations, TODO ensure '/' is not encoded!) 
511     
512[883] NamespacePrefixMappings stores the mappings in a single location, and is sufficient for xpath.     
513
514
515
516
517Version 2.2.1
518=============
519
520Release date
521------------
522
52324 Aug 2009
524
525
526Contributors to this release
527----------------------------
528
529Jason Harrop
530Adam Schmideg
531
532
533Major Changes
534-------------
535
536[869] NamespacePrefixMappers which work with Java 6 (ie if you don't have JAXB in your endorsed dir,
537      or can't (eg Java Web Start)). 
538
539
540Other Changes (not exhaustive)
541-------------
542
543[871] Get rid of System.out.println (mostly).
544
545[870] Avoid returning null DocumentFragment from getNumberXmlNode extension, since this causes
546      Xalan to produce a stack trace
547     
548[867] Use Java's xerces.internal instead of Crimson in CustomXmlDataStorageImpl 
549
550[865] Don't get value of attributes when passing table contents to Converter.toNode;
551      Attributes on <w:tr w:rsidR="00E54D1F" w:rsidRPr="00A84EA1"> screws output.
552
553[864] ImmutablePropertyResolver, contributed by Adam Schmideg.   
554
555
556
557Version 2.2.0
558=============
559
560Release date
561------------
562
56328 July 2009
564
565Contributors to this release
566----------------------------
567
568Serge Grachov
569Jason Harrop
570Adam Schmideg
571Leigh
572
573Major Changes
574-------------
575
576CustomXml applyBindings works (to proof of concept level)
577
578Differencing improvements
579
580Table model and Converter interface.  Use of this table model in HtmlExporterNG,
581to support merged cells. Contributed by Adam Schmideg.
582
583New class PropertyResolver [757], which works out the actual properties which apply to a paragraph or a run.
584
585Header/footer support
586
587PDF via XSL FO or iText (in addition to existing support for via HTML)
588
589"Next Generation" HTML Exporter, which only needs the main document part as input, and
590which takes advantage of docx4j's knowledge of the document (via extension functions)
591so that most of the logic is done in Java (as opposed to xslt).
592
593Improvements to font handling/substitution (inc auto-detect option)
594
595Image insertion convenience methods
596
597Other Changes (not exhaustive)
598-------------
599
600[856] Start of work on NamespacePrefixMappers which work with Java 6 (ie if you don't have JAXB
601      in your endorsed dir, or can't (eg Java Web Start)). [Not finished until v2.2.1!]
602
603[854] Remove ContentTypeManager interface; replace it with implementation.
604      ContentTypeManager: change semantics of isContentTypeRegistered, so that it means
605      'is *default* content type registered'.
606     
607[847] Relativise file URLs for images, based on contribution by Leigh;
608      Only relativise path if its not the tmpdir, because the tmp dir is used for pdf output     
609
610[841] Updated createImgE10 extension function to point to class org.docx4j.model.images.WordXmlPicture. 
611      Reported and fix suggested by Leigh
612     
613[816] Bug fix: Close image files properly; patch contributed by Serge Grachov
614
615[815] Store/retrieve key/value pairs in sdtPr/tag
616
617[808] Text extraction
618
619[799] If no directory for saving images is specified, embed the image using a data: URI
620
621[784] LoadFromZipFileNG, suports reading from an input stream
622
623[776] Renamed package out.xmlPackage to out.FlatOpcXml
624      Renamed XmlPackageCreator to FlatOpcXmlCreator
625     
626[771] Renamed XmlPackageImporter to FlatOpcXmlImporter
627
628[697] Remove fop-fonts; to be replaced with a complete fop jar
629
630[650] Convenience method to restart numbering
631
632[633] Add a createImagePart signature which allows you to specify the source part of the image part's rel,
633      so that the image can be added to eg a header.
634     
635[625] Add an HtmlSetting 'conditionalComments' which defaults to turning these off in the style sheet,
636      since FlyingSaucer PDF Renderer renders <![if!supportMisalignedColumns]><![endif]> verbatim, and there
637      have been reports of Xerces SAX parser not liking these:
638
639[607] Make it easy for a part to have its own NamespacePrefixMapper. (eg for relationships part,
640      we want the rels namespace to be the default namespace)
641     
642     
643     
644
645Version 2.1.0
646=============
647
648Release date
649------------
650
65111 Nov 2008
652
653Contributors to this release
654----------------------------
655
656Jason Harrop
657Manimala Kumar
658
659Major Changes
660-------------
661
662Use docx 2 html XSLT from OpenXMLViewer (OpenXMLViewer XSLT 11089, as downloaded 9 Oct 2008),
663with support for numbering, image handling, hyperlinks 
664
665
666Other Changes (not exhaustive)
667-------------
668
669[563] Specialised parts for some image types (rather than just treating them as BinaryPart).
670
671[560] Create <img> element for E2.0 images
672
673[559] Basic support for resolving a hyperlink by reference to the rels part, using an XSLT
674      extension function.
675     
676[558] Use docx 2 html XSLT from OpenXMLViewer.     
677
678[547] getParts() method contributed by Manimala Kumar
679
680[539] VBA parts
681
682[532] RelationshipsPart is now a JAXB part. 
683
684[529] Differencing improvements
685
686
687
688Version 2.0
689=============
690
691Release date
692------------
693
69421 July 2008
695
696
697Major Changes
698-------------
699
700Support for Flat OPC XML file format
701
702Binary doc import proof of concept (using POI)
703
704Support for <w:drawing> element
705
706Differencing
707
708
709
Note: See TracBrowser for help on using the repository browser.