Page 1 of 1

problem conveting html file to docx

PostPosted: Sat Jun 09, 2012 9:05 am
by bhspencer
I am trying to convert a simple html file to a docx using the example code provided here:
https://github.com/plutext/docx4j/blob/ ... ument.java

The forum won't let me attach the html file to the discussion so I am including it at the end of this post. I have attached the docx output file.

The output file does not contain the the content of the <li> elements other than the first one. If anybody has some thought on what I might be doing wrong I would really appreciate it.

Thanks.



<html>
<head>
<title>Inspire</title>
</head>
<body id="body">
<div id="cvSection">
<h2>Curriculum Vitae: Jin, Xiangyu</h2>
<ol>
<li><a
href="http://10.0.1.201:9080/inspire/?personHandle=10984/repo.1-259"
title="Jin, Xiangyu">Jin, Xiangyu</a></li>.
<li><a
href="http://10.0.1.201:9080/inspire/?personHandle=10984/repo.1-53"
title="French, James">French, James</a></li>.
<li><a
href="http://10.0.1.201:9080/inspire/?personHandle=10984/repo.1-152"
title="Michel, Jonathan">Michel, Jonathan</a></li>. "
<li><a href="http://dx.doi.org/10.1007/11670834_16"
title="Toward consistent evaluation of relevance feedback approaches in multimedia retrieval">Toward
consistent evaluation of relevance feedback approaches in
multimedia retrieval</a></li>."
<li><a href="http://www.google.com"></a></li>.
<li><a title="2006">2006</a></li>.
<li><a title="inproceedings">inproceedings</a></li>.
</ol>
<ol>
<li><a
href="http://10.0.1.201:9080/inspire/?personHandle=10984/repo.1-259"
title="Jin, Xiangyu">Jin, Xiangyu</a></li>.
<li><a
href="http://10.0.1.201:9080/inspire/?personHandle=10984/repo.1-53"
title="French, James">French, James</a></li>.
<li><a
href="http://10.0.1.201:9080/inspire/?personHandle=10984/repo.1-152"
title="Michel, Jonathan">Michel, Jonathan</a></li>. "
<li><a href="http://doi.acm.org/10.1145/1148170.1148302"
title="Quantative analysis of the impact of judging inconsistency on the performance of relevance feedback">Quantative
analysis of the impact of judging inconsistency on the performance
of relevance feedback</a></li>."
<li><a href="http://www.google.com"></a></li>.
<li><a title="2006">2006</a></li>.
<li><a title="inproceedings">inproceedings</a></li>.
</ol>
</div>
</body>
</html>

Re: problem conveting html file to docx

PostPosted: Sat Jun 09, 2012 9:29 am
by jason
Please try yesterday's nightly build: http://www.docx4java.org/docx4j/docx4j- ... 120608.jar

I used that, and see the list items. Note that your XHTML as pasted contains fullstops/periods and quote characters between list items; these are being rendered as separate paragraphs, which seems not unreasonable.

Re: problem conveting html file to docx

PostPosted: Tue Jun 12, 2012 1:49 am
by bhspencer
The nightly build does indeed create the desired docx file.

Many thanks.