Page 1 of 1

Relative URL to images

PostPosted: Mon Mar 11, 2013 8:57 pm
by pritesh.shah17
Hello All,

I am converting HTML to doc.
There are images in HTML which have relative URL.
Due to relative URL, generated doc doesn't display the image.
Can you anyone tell me how resolve relative URL, so doc to display images?
Also, when user is not connected to internet, images with full or absolute urls are not displayed. can you anyone help me solve this issues?

Thanks in advance!!!

Thanks,
Pritesh

Re: Relative URL to images

PostPosted: Thu Mar 14, 2013 9:31 am
by jason
What Java code are using using to do your conversion?

Relative URLs work for me using:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
public static List<Object> convert(URL url, WordprocessingMLPackage wordMLPackage)
 
Parsed in 0.014 seconds, using GeSHi 1.0.8.4


What does your relative URL look like?

pritesh.shah17 wrote:Also, when user is not connected to internet, images with full or absolute urls are not displayed. can you anyone help me solve this issues?


You mean, in your Word document? Try converting your linked images to embedded ones. See the ImageConvertEmbeddedToLinked sample which goes the other way.

ps this topic will be moved to the appropriate subforum soon

Re: Relative URL to images

PostPosted: Thu Mar 14, 2013 4:05 pm
by pritesh.shah17
Thanks Jason for reply.

Please find attached sample code using which I am trying to convert html string to doc.
Html string have "<img src=" pointing to live url of image. Image is added to generated doc but displayed only when I am connected to internet. So Its linked image and I want it to embed it to document, so even if I am not connected to internet, I see images in my doc file.

Again, highlighting I am converting Html string to doc.

Also, you can check the html file opening browser and document. Formatting and look & feel are much different.

Looking forward for you help to resolve my issue.

Thank You very much for reply and thanks in advance for further help

Thanks,
Pritesh

Re: Relative URL to images

PostPosted: Thu Mar 14, 2013 5:30 pm
by jason
Your code uses AltChunkType.Html, and then invokes convertAltChunks:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
package com.test;

import java.io.File;
import java.util.List;

import org.apache.commons.io.FileUtils;
import org.docx4j.XmlUtils;
import org.docx4j.dml.wordprocessingDrawing.Inline;
import org.docx4j.jaxb.Context;
import org.docx4j.model.structure.SectionWrapper;
import org.docx4j.openpackaging.contenttype.ContentType;
import org.docx4j.openpackaging.exceptions.InvalidFormatException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.Part;
import org.docx4j.openpackaging.parts.WordprocessingML.AltChunkType;
import org.docx4j.openpackaging.parts.WordprocessingML.AlternativeFormatInputPart;
import org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage;
import org.docx4j.openpackaging.parts.WordprocessingML.HeaderPart;
import org.docx4j.relationships.Relationship;
import org.docx4j.utils.BufferUtil;
import org.docx4j.wml.CTAltChunk;
import org.docx4j.wml.Hdr;
import org.docx4j.wml.HdrFtrRef;
import org.docx4j.wml.HeaderReference;
import org.docx4j.wml.Jc;
import org.docx4j.wml.JcEnumeration;
import org.docx4j.wml.ObjectFactory;
import org.docx4j.wml.PPr;
import org.docx4j.wml.SectPr;

public class HTMLToDoc {
       
        static  String locationOfFile = "C:/Users/priteshs/Desktop/Test" + "/test2.doc";
       
        public static void main(final String[] args) throws Exception {
                // Converter converter = new Converter();
                // converter.convertFile();

                String xhtml = FileUtils.readFileToString(new File(
                                "C:\\Users\\priteshs\\Desktop\\Test\\html_text.html"), "UTF-8");
                convertHTMLtoDOC(xhtml);
        }
       
        static void convertHTMLtoDOC(final String html) throws Exception {
                WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                                .createPackage();

                Relationship styleRel =  wordMLPackage.getMainDocumentPart().getStyleDefinitionsPart().getSourceRelationships().get(0);
                wordMLPackage.getMainDocumentPart().getRelationshipsPart().removeRelationship(styleRel);       

//               1. the Header part
                Relationship relationship = createHeaderPart(wordMLPackage);
//               2. an entry in SectPr
                createHeaderReference(wordMLPackage, relationship);

                AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(
                                AltChunkType.Html);
                afiPart.setBinaryData(html.getBytes("UTF-8"));
                afiPart.setContentType(new ContentType("text/html"));
                Relationship altChunkRel = wordMLPackage.getMainDocumentPart()
                                .addTargetPart(afiPart);
               
                // .. the bit in document body
                CTAltChunk ac = Context.getWmlObjectFactory().createCTAltChunk();
                ac.setId(altChunkRel.getId());
               
                wordMLPackage.getMainDocumentPart().addObject(ac);
                wordMLPackage.getContentTypeManager().addDefaultContentType("html",
                "text/html");

                // CONVERTING ALTCHUNKS
                WordprocessingMLPackage pkgOut = wordMLPackage.getMainDocumentPart()
                                .convertAltChunks();
                pkgOut.save(new java.io.File(locationOfFile));
                System.out.println(XmlUtils.marshaltoString(pkgOut
                                .getMainDocumentPart().getJaxbElement(), true, true));
        }

        public static Relationship createHeaderPart(
                        WordprocessingMLPackage wordprocessingMLPackage) throws Exception {

                HeaderPart headerPart = new HeaderPart();
                Relationship rel = wordprocessingMLPackage.getMainDocumentPart()
                                .addTargetPart(headerPart);

                // After addTargetPart, so image can be added properly
                headerPart.setJaxbElement(getHdr(wordprocessingMLPackage, headerPart));

                return rel;
        }
       
        public static void createHeaderReference(
                        WordprocessingMLPackage wordprocessingMLPackage,
                        Relationship relationship) throws InvalidFormatException {

                List<SectionWrapper> sections = wordprocessingMLPackage
                                .getDocumentModel().getSections();

                SectPr sectPr = sections.get(sections.size() - 1).getSectPr();
                // There is always a section wrapper, but it might not contain a sectPr
                if (sectPr == null) {
                        sectPr = objectFactory.createSectPr();
                        wordprocessingMLPackage.getMainDocumentPart().addObject(sectPr);
                        sections.get(sections.size() - 1).setSectPr(sectPr);
                }

                HeaderReference headerReference = objectFactory.createHeaderReference();
                headerReference.setId(relationship.getId());
                headerReference.setType(HdrFtrRef.DEFAULT);
                sectPr.getEGHdrFtrReferences().add(headerReference);// add header or
                // footer references

        }
        private static ObjectFactory objectFactory = new ObjectFactory();
        public static Hdr getHdr(WordprocessingMLPackage wordprocessingMLPackage,
                        Part sourcePart) throws Exception {
                Hdr hdr = objectFactory.createHdr();

                File file = new File("C:\\Users\\priteshs\\Desktop\\temp\\google_logo.png");
                java.io.InputStream is = new java.io.FileInputStream(file);

                hdr.getContent().add(
                                newImage(wordprocessingMLPackage, sourcePart,
                                                BufferUtil.getBytesFromInputStream(is), "filename",
                                                "alttext", 1, 2));
                return hdr;

        }

        public static org.docx4j.wml.P newImage(
                        WordprocessingMLPackage wordMLPackage, Part sourcePart,
                        byte[] bytes, String filenameHint, String altText, int id1, int id2)
                        throws Exception {

                BinaryPartAbstractImage imagePart = BinaryPartAbstractImage
                                .createImagePart(wordMLPackage, sourcePart, bytes);

                Inline inline = imagePart.createImageInline(filenameHint, altText, id1,
                                id2, false);

                // Now add the inline in w:p/w:r/w:drawing
                org.docx4j.wml.ObjectFactory factory = Context.getWmlObjectFactory();
                org.docx4j.wml.P p = factory.createP();
                org.docx4j.wml.R run = factory.createR();
                PPr pPr= factory.createPPr();
               
                Jc jc = factory.createJc();
            jc.setVal(JcEnumeration.CENTER);
            pPr.setJc(jc);
           
                p.setPPr(pPr);
                p.getContent().add(run);
                org.docx4j.wml.Drawing drawing = factory.createDrawing();
                run.getContent().add(drawing);
                drawing.getAnchorOrInline().add(inline);

                return p;

        }
}

 
Parsed in 0.026 seconds, using GeSHi 1.0.8.4


But docx4j won't process that. It only processes XHTML and text/plain (unless you have MergeDocx, in which case it also does docx/dotx etc).

So what you are seeing (conversion quality, linked images) is a conversion done by Word.

I suggest you convert your HTML to well formed XML (using JTidy or something), and then use docx4j's XHTMLImporter directly. See the ConvertInXHTML* samples.

Re: Relative URL to images

PostPosted: Thu Mar 14, 2013 7:51 pm
by pritesh.shah17
Thanks for quick reply Jason.

So according to you, If I convert html to xHtml and the use XHTMLImporter to convert it to doc, images would be embedded instead of linked, then I see images in doc even if I am not connected to Internet.

Please let me know if my understanding is not correct

Looking forward for you reply.

Thanks,
Pritesh

Re: Relative URL to images

PostPosted: Thu Mar 14, 2013 8:07 pm
by jason
Correct

Re: Relative URL to images

PostPosted: Fri Mar 15, 2013 9:55 pm
by pritesh.shah17
Hello Jason,

Thank you very much for continuous support in resolving my issue.

I tried converting my html to xhtml using Jtidy and then used xHTMLImporter for converting generated xhtml to doc. Doc is generated successfully and I am able to see the image without internet also, but the formatting of bullets points are very bad.

Please see the attached code, xhtml, and generated doc in zip file.

Please help me to resolve this formatting issue.

Thanks in advance!!

Thanks,
Pritesh

Re: Relative URL to images

PostPosted: Fri Mar 15, 2013 10:01 pm
by pritesh.shah17
Also, Jason, How can i add header and footer to all the pages when I am converting xHTML to doc using xHTMLImporter?

Re: Relative URL to images

PostPosted: Fri Mar 22, 2013 7:54 pm
by pritesh.shah17
Hello Jason,

Can you please reply for my query and issue with last 2 posts?
Your replies are very helpful to me and it helps in resolving my issue.

Please help!!!

Thanks,
Pritesh

Re: Relative URL to images

PostPosted: Sat Mar 30, 2013 12:30 am
by pritesh.shah17
Hello Jason,

Please reply,
I deadly need your help

Thanks,
Pritesh

Re: Relative URL to images

PostPosted: Sat Mar 30, 2013 7:27 am
by jason
Please create new topics, one for each of the two new questions, in the correct sub-forum.

But first, search to see whether they have been raised/answered before. I'm pretty sure bullets have come up before, and the other may have.