BinaryPartAbstractImage.createImagePart is not handling duplicate images thus its making WORD/PPT/EXCEL very heavy in size.
Scenario WITHOUT DOCX4J:
1) If one image is placed on 2different pages(page 1 and page 2), you can see only one image in the xl/media folder.
2) Images are named as image1, image2,image3 so on.
3) In xl/drawings/_rels/drawing.xml TARGET value corresponding to the relationshipid has image name like image1 or image2 so on.
Scenarion WITH DOCX4J:
1) If one image is placed on 2different pages, you see TWO same images in xl/media folder .
2) Same Images are named as drawing1_image_rId1 and drawing2_image_rId1.
3) In xl/drawings/_rels/drawing.xml TARGET value corresponding to the relationshipid has image name like drawing1_image_rId1 or drawing1_image_rId1 so on.
FOR POINT 1: It is not detecting if the image is already present in the xl/media folder. Thus it creates same images multiple times. For files with 100+pages and same image(Ex: LOGO of company) present on each file, it creates 100 same images in xl/media folder making it very heavy in size.
FOR POINT 2:
BinaryPartAbstractImage.createImagePart(OpcPackage opcPackage,Part sourcePart, byte bytes) --> createImageName(OpcPackage opcPackage, Base sourcePart, String proposedRelId, String ext) --> generateUniqueName(Base sourcePart, String proposedRelId,String directoryPrefix, String after_, String ext)
In the generateUniqueName method the renaming is done for image files. It appends drawing name, relID, underscores.
This causes the problem.
FOR POINT 3: If POINT 1 and POINT 2 are handled this Point will be automatically handled.
Is is possible for you to handle these scenarios?