Page 1 of 1

Keeping Styles in Docx while Extracting Contents

PostPosted: Wed Sep 12, 2018 1:56 pm
by tahir
Hi
I am trying to extract data from one docx file for adding into second docx file with following code:
Code: Select all
        List list = getTextFromOtherFile(Path);
        for (Object obj : list) {
            final StringWriter stringWriter = new StringWriter();
            TextUtils.extractText(obj, stringWriter);
            final String paragraphString1 = stringWriter.toString();
            createTableRow1(tbl, paragraphString1);
        }

getTextFromOtherFile(Path) method code is as follows:
Code: Select all
    public List getTextFromOtherFile(String file_name) throws Docx4JException, JAXBException, XPathBinderAssociationIsPartialException {
        File doc = new File(file_name);
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(doc);
        MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
        final String XPATH_TO_SELECT_TEXT_NODES = "//w:p";
        //System.out.println(mainDocumentPart.getXML());
        final List<Object> jAXBNodesViaXPath = mainDocumentPart.getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);
        return jAXBNodesViaXPath;
    }

createTableRow1(tbl, paragraphString1) method code is as follows:
Code: Select all
    public void createTableRow1(Tbl tbl, String text) {
        ObjectFactory wmlObjectFactory = Context.getWmlObjectFactory();
        // Create object for tr
        Tr tr7 = wmlObjectFactory.createTr();
        tbl.getContent().add(tr7);
        // Create object for tc (wrapped in JAXBElement)
        Tc tc7 = wmlObjectFactory.createTc();
        JAXBElement<org.docx4j.wml.Tc> tcWrapped7 = wmlObjectFactory.createTrTc(tc7);
        tr7.getContent().add(tcWrapped7);
        // Create object for tcPr
        TcPr tcpr7 = wmlObjectFactory.createTcPr();
        tc7.setTcPr(tcpr7);
        // Create object for tcW
        TblWidth tblwidth8 = wmlObjectFactory.createTblWidth();
        tcpr7.setTcW(tblwidth8);
        tblwidth8.setType("dxa");
        tblwidth8.setW(BigInteger.valueOf(9576));
        // Create object for gridSpan
        TcPrInner.GridSpan tcprinnergridspan7 = wmlObjectFactory.createTcPrInnerGridSpan();
        tcpr7.setGridSpan(tcprinnergridspan7);
        tcprinnergridspan7.setVal(BigInteger.valueOf(3));
        // Create object for p
        P p7 = wmlObjectFactory.createP();
        tc7.getContent().add(p7);

        R r5 = wmlObjectFactory.createR();
        p7.getContent().add(r5);
        // Create object for rPr
        RPr rpr5 = wmlObjectFactory.createRPr();
        r5.setRPr(rpr5);
        // Create object for sz
        HpsMeasure hpsmeasure23 = wmlObjectFactory.createHpsMeasure();
        rpr5.setSz(hpsmeasure23);
        hpsmeasure23.setVal(BigInteger.valueOf(24));
        // Create object for szCs
        HpsMeasure hpsmeasure24 = wmlObjectFactory.createHpsMeasure();
        rpr5.setSzCs(hpsmeasure24);
        hpsmeasure24.setVal(BigInteger.valueOf(24));
        // Create object for t (wrapped in JAXBElement)
        Text text5 = wmlObjectFactory.createText();
        JAXBElement<org.docx4j.wml.Text> textWrapped5 = wmlObjectFactory.createRT(text5);
        r5.getContent().add(textWrapped5);
        text5.setValue(text);
    }

Everything is working fine except the spaces/tabs and auto numbering etc because these things do not appear in second document.
Can anybody help me to get everything from first document to print it as it is in second document without any change? Thanks in advance.

Re: Keeping Styles in Docx while Extracting Contents

PostPosted: Thu Sep 13, 2018 8:41 am
by jason
tahir wrote:Everything is working fine except the spaces/tabs and auto numbering etc


For auto numbering, you'll need to copy the relevant definitions from the numbering part (and change its ID etc to avoid collisions). If these are used via a style, you'll need the relevant style suitable updated. In the general case, there is a lot to worry about when merging documents...

Or you could try Plutext's commercial Docx4j Enterprise, which includes "MergeDocx"; see the MergeIntoTableCell example.

Re: Keeping Styles in Docx while Extracting Contents

PostPosted: Sat Sep 15, 2018 5:04 pm
by tahir
I was trying to explore the ways of copying styles and everything from source docx to destination docx. I found a way:
Code: Select all
            File doc = new File("D://test.docx");
            WordprocessingMLPackage wordPackage = WordprocessingMLPackage.load(doc);
            MainDocumentPart mainDocumentPart = wordPackage.getMainDocumentPart();
            String str = mainDocumentPart.getXML();
           
            Document document = (Document)XmlUtils.unmarshalString(str);
           
            File doc1 = new File("D://test1.docx");
            WordprocessingMLPackage wordPackage1 = WordprocessingMLPackage.load(doc1);
            MainDocumentPart mainDocumentPart1 = wordPackage1.getMainDocumentPart();
            mainDocumentPart1.setContents(document);
            String str1 = mainDocumentPart1.getXML();
            wordPackage1.save(doc1);

Get all xml from source and set it as it is into destination document. But the problem with this technique is; it copies all source document and replace everything into destination document. Can we get XML related to only paragraphs from source doc to replace or insert them into destination document at specific location or table? Thanks in advance for your valueable input.

Re: Keeping Styles in Docx while Extracting Contents

PostPosted: Thu Sep 20, 2018 9:58 am
by jason
Marshall (getXml) then unmarshall is an effective way of cloning/copying.

In fact this is how docx4j's deepCopy works: https://github.com/plutext/docx4j/blob/ ... java#L1037

You can use deepCopy to clone a P (or other object).

But you'll still need to manage any styles, rels etc yourself. See further https://www.plutext.com/mergedocx_java.html