服务器之家

服务器之家 > 正文

java使用POI实现html和word相互转换

时间:2021-06-23 14:05     来源/作者:追逐盛夏流年

项目后端使用了springboot,maven,前端使用了ckeditor富文本编辑器。目前从html转换的word为doc格式,而图片处理支持的是docx格式,所以需要手动把doc另存为docx,然后才可以进行图片替换。

一.添加maven依赖

主要使用了以下和poi相关的依赖,为了便于获取html的图片元素,还使用了jsoup:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<dependency>
  <groupid>org.apache.poi</groupid>
  <artifactid>poi</artifactid>
  <version>3.14</version>
</dependency>
 
<dependency>
  <groupid>org.apache.poi</groupid>
  <artifactid>poi-scratchpad</artifactid>
  <version>3.14</version>
</dependency>
 
<dependency>
  <groupid>org.apache.poi</groupid>
  <artifactid>poi-ooxml</artifactid>
  <version>3.14</version>
</dependency>
 
<dependency>
  <groupid>fr.opensagres.xdocreport</groupid>
  <artifactid>xdocreport</artifactid>
  <version>1.0.6</version>
</dependency>
 
<dependency>
  <groupid>org.apache.poi</groupid>
  <artifactid>poi-ooxml-schemas</artifactid>
  <version>3.14</version>
</dependency>
 
<dependency>
  <groupid>org.apache.poi</groupid>
  <artifactid>ooxml-schemas</artifactid>
  <version>1.3</version>
</dependency>
 
<dependency>
  <groupid>org.jsoup</groupid>
  <artifactid>jsoup</artifactid>
  <version>1.11.3</version>
</dependency>

二.word转换为html

在springboot项目的resources目录下新建static文件夹,将需要转换的word文件temp.docx粘贴进去,由于static是springboot的默认资源文件,所以不需要在配置文件里面另行配置了,如果改成其他名字,需要在application.yml进行相应配置。

doc格式转换为html:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
public static string doctohtml() throws exception {
  file path = new file(resourceutils.geturl("classpath:").getpath());
  string imagepathstr = path.getabsolutepath() + "\\static\\image\\";
  string sourcefilename = path.getabsolutepath() + "\\static\\test.doc";
  string targetfilename = path.getabsolutepath() + "\\static\\test2.html";
  file file = new file(imagepathstr);
  if(!file.exists()) {
    file.mkdirs();
  }
  hwpfdocument worddocument = new hwpfdocument(new fileinputstream(sourcefilename));
  org.w3c.dom.document document = documentbuilderfactory.newinstance().newdocumentbuilder().newdocument();
  wordtohtmlconverter wordtohtmlconverter = new wordtohtmlconverter(document);
  //保存图片,并返回图片的相对路径
  wordtohtmlconverter.setpicturesmanager((content, picturetype, name, width, height) -> {
    try (fileoutputstream out = new fileoutputstream(imagepathstr + name)) {
      out.write(content);
    } catch (exception e) {
      e.printstacktrace();
    }
    return "image/" + name;
  });
  wordtohtmlconverter.processdocument(worddocument);
  org.w3c.dom.document htmldocument = wordtohtmlconverter.getdocument();
  domsource domsource = new domsource(htmldocument);
  streamresult streamresult = new streamresult(new file(targetfilename));
  transformerfactory tf = transformerfactory.newinstance();
  transformer serializer = tf.newtransformer();
  serializer.setoutputproperty(outputkeys.encoding, "utf-8");
  serializer.setoutputproperty(outputkeys.indent, "yes");
  serializer.setoutputproperty(outputkeys.method, "html");
  serializer.transform(domsource, streamresult);
  return targetfilename;
}

docx格式转换为html

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public static string docxtohtml() throws exception {
  file path = new file(resourceutils.geturl("classpath:").getpath());
  string imagepath = path.getabsolutepath() + "\\static\\image";
  string sourcefilename = path.getabsolutepath() + "\\static\\test.docx";
  string targetfilename = path.getabsolutepath() + "\\static\\test.html";
 
  outputstreamwriter outputstreamwriter = null;
  try {
    xwpfdocument document = new xwpfdocument(new fileinputstream(sourcefilename));
    xhtmloptions options = xhtmloptions.create();
    // 存放图片的文件夹
    options.setextractor(new fileimageextractor(new file(imagepath)));
    // html中图片的路径
    options.uriresolver(new basicuriresolver("image"));
    outputstreamwriter = new outputstreamwriter(new fileoutputstream(targetfilename), "utf-8");
    xhtmlconverter xhtmlconverter = (xhtmlconverter) xhtmlconverter.getinstance();
    xhtmlconverter.convert(document, outputstreamwriter, options);
  } finally {
    if (outputstreamwriter != null) {
      outputstreamwriter.close();
    }
  }
  return targetfilename;
}

转换成功后会生成对应的html文件,如果想在前端展示,直接读取文件转换为string返回给前端即可。

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public static string readfile(string filepath) {
  file file = new file(filepath);
  inputstream input = null;
  try {
    input = new fileinputstream(file);
  } catch (filenotfoundexception e) {
    e.printstacktrace();
  }
  stringbuffer buffer = new stringbuffer();
  byte[] bytes = new byte[1024];
  try {
    for (int n; (n = input.read(bytes)) != -1;) {
      buffer.append(new string(bytes, 0, n, "utf8"));
    }
  } catch (ioexception e) {
    e.printstacktrace();
  }
  return buffer.tostring();
}

在富文本编辑器ckeditor中的显示效果:

java使用POI实现html和word相互转换

三.html转换为word

实现思路就是先把html中的所有图片元素提取出来,统一替换为变量字符”${imgreplace}“,如果多张图片,可以依序排列下去,之后生成对应的doc文件(之前试过直接生成docx文件发现打不开,这个问题尚未找到好的解决方法),我们将其另存为docx文件,之后就可以替换变量为图片了:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
public static string writewordfile(string content) {
    string path = "d:/wordfile";
    map<string, object> param = new hashmap<string, object>();
 
    if (!"".equals(path)) {
      file filedir = new file(path);
      if (!filedir.exists()) {
        filedir.mkdirs();
      }
      content = htmlutils.htmlunescape(content);
      list<hashmap<string, string>> imgs = getimgstr(content);
      int count = 0;
      for (hashmap<string, string> img : imgs) {
        count++;
        //处理替换以“/>”结尾的img标签
        content = content.replace(img.get("img"), "${imgreplace" + count + "}");
        //处理替换以“>”结尾的img标签
        content = content.replace(img.get("img1"), "${imgreplace" + count + "}");
        map<string, object> header = new hashmap<string, object>();
 
        try {
          file filepath = new file(resourceutils.geturl("classpath:").getpath());
          string imagepath = filepath.getabsolutepath() + "\\static\\";
          imagepath += img.get("src").replaceall("/", "\\\\");
          //如果没有宽高属性,默认设置为400*300
          if(img.get("width") == null || img.get("height") == null) {
            header.put("width", 400);
            header.put("height", 300);
          }else {
            header.put("width", (int) (double.parsedouble(img.get("width"))));
            header.put("height", (int) (double.parsedouble(img.get("height"))));
          }
          header.put("type", "jpg");
          header.put("content", officeutil.inputstream2bytearray(new fileinputstream(imagepath), true));
        } catch (filenotfoundexception e) {
          e.printstacktrace();
        }
        param.put("${imgreplace" + count + "}", header);
      }
      try {
        // 生成doc格式的word文档,需要手动改为docx
        byte by[] = content.getbytes("utf-8");
        bytearrayinputstream bais = new bytearrayinputstream(by);
        poifsfilesystem poifs = new poifsfilesystem();
        directoryentry directory = poifs.getroot();
        documententry documententry = directory.createdocument("worddocument", bais);
        fileoutputstream ostream = new fileoutputstream("d:\\wordfile\\temp.doc");
        poifs.writefilesystem(ostream);
        bais.close();
        ostream.close();
 
        // 临时文件(手动改好的docx文件)
        customxwpfdocument doc = officeutil.generateword(param, "d:\\wordfile\\temp.docx");
        //最终生成的带图片的word文件
        fileoutputstream fopts = new fileoutputstream("d:\\wordfile\\final.docx");
        doc.write(fopts);
        fopts.close();
      } catch (exception e) {
        e.printstacktrace();
      }
 
    }
    return "d:/wordfile/final.docx";
  }
 
  //获取html中的图片元素信息
  public static list<hashmap<string, string>> getimgstr(string htmlstr) {
    list<hashmap<string, string>> pics = new arraylist<hashmap<string, string>>();
 
    document doc = jsoup.parse(htmlstr);
    elements imgs = doc.select("img");
    for (element img : imgs) {
      hashmap<string, string> map = new hashmap<string, string>();
      if(!"".equals(img.attr("width"))) {
        map.put("width", img.attr("width").substring(0, img.attr("width").length() - 2));
      }
      if(!"".equals(img.attr("height"))) {
        map.put("height", img.attr("height").substring(0, img.attr("height").length() - 2));
      }
      map.put("img", img.tostring().substring(0, img.tostring().length() - 1) + "/>");
      map.put("img1", img.tostring());
      map.put("src", img.attr("src"));
      pics.add(map);
    }
    return pics;
  }

officeutil工具类,之前发现网上的写法只支持一张图片的修改,多张图片就会报错,是因为添加了图片,processparagraphs方法中的runs的大小改变了,会报arraylist的异常,就和我们循环list中删除元素会报异常道理一样,解决方法就是复制一个新的arraylist进行循环即可:

 

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
package com.example.demo.util;
 
import java.io.bytearrayinputstream;
import java.io.fileinputstream;
import java.io.ioexception;
import java.io.inputstream;
import java.util.arraylist;
import java.util.iterator;
import java.util.list;
import java.util.map;
import java.util.map.entry;
 
import org.apache.poi.poixmldocument;
import org.apache.poi.hwpf.extractor.wordextractor;
import org.apache.poi.openxml4j.opc.opcpackage;
import org.apache.poi.xwpf.usermodel.xwpfparagraph;
import org.apache.poi.xwpf.usermodel.xwpfrun;
import org.apache.poi.xwpf.usermodel.xwpftable;
import org.apache.poi.xwpf.usermodel.xwpftablecell;
import org.apache.poi.xwpf.usermodel.xwpftablerow;
 
/**
 * 适用于word 2007
 */
public class officeutil {
 
  /**
   * 根据指定的参数值、模板,生成 word 文档
   * @param param 需要替换的变量
   * @param template 模板
   */
  public static customxwpfdocument generateword(map<string, object> param, string template) {
    customxwpfdocument doc = null;
    try {
      opcpackage pack = poixmldocument.openpackage(template);
      doc = new customxwpfdocument(pack);
      if (param != null && param.size() > 0) {
 
        //处理段落
        list<xwpfparagraph> paragraphlist = doc.getparagraphs();
        processparagraphs(paragraphlist, param, doc);
 
        //处理表格
        iterator<xwpftable> it = doc.gettablesiterator();
        while (it.hasnext()) {
          xwpftable table = it.next();
          list<xwpftablerow> rows = table.getrows();
          for (xwpftablerow row : rows) {
            list<xwpftablecell> cells = row.gettablecells();
            for (xwpftablecell cell : cells) {
              list<xwpfparagraph> paragraphlisttable = cell.getparagraphs();
              processparagraphs(paragraphlisttable, param, doc);
            }
          }
        }
      }
    } catch (exception e) {
      e.printstacktrace();
    }
    return doc;
  }
  /**
   * 处理段落
   * @param paragraphlist
   */
  public static void processparagraphs(list<xwpfparagraph> paragraphlist,map<string, object> param,customxwpfdocument doc){
    if(paragraphlist != null && paragraphlist.size() > 0){
      for(xwpfparagraph paragraph:paragraphlist){
        //poi转换过来的行间距过大,需要手动调整
        if(paragraph.getspacingbefore() >= 1000 || paragraph.getspacingafter() > 1000) {
          paragraph.setspacingbefore(0);
          paragraph.setspacingafter(0);
        }
        //设置word中左右间距
        paragraph.setindentationleft(0);
        paragraph.setindentationright(0);
        list<xwpfrun> runs = paragraph.getruns();
        //加了图片,修改了paragraph的runs的size,所以循环不能使用runs
        list<xwpfrun> allruns = new arraylist<xwpfrun>(runs);
        for (xwpfrun run : allruns) {
          string text = run.gettext(0);
          if(text != null){
            boolean issettext = false;
            for (entry<string, object> entry : param.entryset()) {
              string key = entry.getkey();
              if(text.indexof(key) != -1){
                issettext = true;
                object value = entry.getvalue();
                if (value instanceof string) {//文本替换
                  text = text.replace(key, value.tostring());
                } else if (value instanceof map) {//图片替换
                  text = text.replace(key, "");
                  map pic = (map)value;
                  int width = integer.parseint(pic.get("width").tostring());
                  int height = integer.parseint(pic.get("height").tostring());
                  int pictype = getpicturetype(pic.get("type").tostring());
                  byte[] bytearray = (byte[]) pic.get("content");
                  bytearrayinputstream byteinputstream = new bytearrayinputstream(bytearray);
                  try {
                    string blipid = doc.addpicturedata(byteinputstream,pictype);
                    doc.createpicture(blipid,doc.getnextpicnamenumber(pictype), width, height,paragraph);
                  } catch (exception e) {
                    e.printstacktrace();
                  }
                }
              }
            }
            if(issettext){
              run.settext(text,0);
            }
          }
        }
      }
    }
  }
  /**
   * 根据图片类型,取得对应的图片类型代码
   * @param pictype
   * @return int
   */
  private static int getpicturetype(string pictype){
    int res = customxwpfdocument.picture_type_pict;
    if(pictype != null){
      if(pictype.equalsignorecase("png")){
        res = customxwpfdocument.picture_type_png;
      }else if(pictype.equalsignorecase("dib")){
        res = customxwpfdocument.picture_type_dib;
      }else if(pictype.equalsignorecase("emf")){
        res = customxwpfdocument.picture_type_emf;
      }else if(pictype.equalsignorecase("jpg") || pictype.equalsignorecase("jpeg")){
        res = customxwpfdocument.picture_type_jpeg;
      }else if(pictype.equalsignorecase("wmf")){
        res = customxwpfdocument.picture_type_wmf;
      }
    }
    return res;
  }
  /**
   * 将输入流中的数据写入字节数组
   * @param in
   * @return
   */
  public static byte[] inputstream2bytearray(inputstream in,boolean isclose){
    byte[] bytearray = null;
    try {
      int total = in.available();
      bytearray = new byte[total];
      in.read(bytearray);
    } catch (ioexception e) {
      e.printstacktrace();
    }finally{
      if(isclose){
        try {
          in.close();
        } catch (exception e2) {
          system.out.println("关闭流失败");
        }
      }
    }
    return bytearray;
  }
}

我认为之所以word2003不支持图片替换,主要是处理2003版本的hwpfdocument对象被声明为了final,我们就无法重写他的方法了。而处理2007版本的类为xwpfdocument,是可以继承的,通过继承xwpfdocument,重写createpicture方法即可实现图片替换,以下为对应的customxwpfdocument类:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
package com.example.demo.util; 
 
import java.io.ioexception;
import java.io.inputstream;
import org.apache.poi.openxml4j.opc.opcpackage;
import org.apache.poi.xwpf.usermodel.xwpfdocument;
import org.apache.poi.xwpf.usermodel.xwpfparagraph;
import org.apache.xmlbeans.xmlexception;
import org.apache.xmlbeans.xmltoken;
import org.openxmlformats.schemas.drawingml.x2006.main.ctnonvisualdrawingprops;
import org.openxmlformats.schemas.drawingml.x2006.main.ctpositivesize2d;
import org.openxmlformats.schemas.drawingml.x2006.wordprocessingdrawing.ctinline;
 
/**
 * 自定义 xwpfdocument,并重写 createpicture()方法
 */
public class customxwpfdocument extends xwpfdocument { 
  public customxwpfdocument(inputstream in) throws ioexception { 
    super(in); 
  
 
  public customxwpfdocument() { 
    super(); 
  
 
  public customxwpfdocument(opcpackage pkg) throws ioexception { 
    super(pkg); 
  
 
  /**
   * @param ind
   * @param width 宽
   * @param height 高
   * @param paragraph 段落
   */
  public void createpicture(string blipid, int ind, int width, int height,xwpfparagraph paragraph) { 
    final int emu = 9525
    width *= emu; 
    height *= emu; 
    ctinline inline = paragraph.createrun().getctr().addnewdrawing().addnewinline(); 
    string picxml = ""
        + "<a:graphic xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\">"
        + "  <a:graphicdata uri=\"http://schemas.openxmlformats.org/drawingml/2006/picture\">"
        + "   <pic:pic xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\">"
        + "     <pic:nvpicpr>" + "      <pic:cnvpr id=\""
        + ind 
        + "\" name=\"generated\"/>"
        + "      <pic:cnvpicpr/>"
        + "     </pic:nvpicpr>"
        + "     <pic:blipfill>"
        + "      <a:blip r:embed=\""
        + blipid 
        + "\" xmlns:r=\"http://schemas.openxmlformats.org/officedocument/2006/relationships\"/>"
        + "      <a:stretch>"
        + "        <a:fillrect/>"
        + "      </a:stretch>"
        + "     </pic:blipfill>"
        + "     <pic:sppr>"
        + "      <a:xfrm>"
        + "        <a:off x=\"0\" y=\"0\"/>"
        + "        <a:ext cx=\""
        + width 
        + "\" cy=\""
        + height 
        + "\"/>"
        + "      </a:xfrm>"
        + "      <a:prstgeom prst=\"rect\">"
        + "        <a:avlst/>"
        + "      </a:prstgeom>"
        + "     </pic:sppr>"
        + "   </pic:pic>"
        + "  </a:graphicdata>" + "</a:graphic>"
 
    inline.addnewgraphic().addnewgraphicdata(); 
    xmltoken xmltoken = null
    try
      xmltoken = xmltoken.factory.parse(picxml); 
    } catch (xmlexception xe) { 
      xe.printstacktrace(); 
    
    inline.set(xmltoken); 
 
    inline.setdistt(0);  
    inline.setdistb(0);  
    inline.setdistl(0);  
    inline.setdistr(0);  
 
    ctpositivesize2d extent = inline.addnewextent(); 
    extent.setcx(width); 
    extent.setcy(height); 
 
    ctnonvisualdrawingprops docpr = inline.addnewdocpr();  
    docpr.setid(ind);  
    docpr.setname("图片" + ind);  
    docpr.setdescr("测试"); 
  
}

以上就是通过poi实现html和word的相互转换,对于html无法转换为可读的docx这个问题尚未解决,如果大家有好的解决方法可以交流一下。

原文链接:https://blog.csdn.net/j1231230/article/details/80712531

标签:

相关文章

热门资讯

2020微信伤感网名听哭了 让对方看到心疼的伤感网名大全
2020微信伤感网名听哭了 让对方看到心疼的伤感网名大全 2019-12-26
yue是什么意思 网络流行语yue了是什么梗
yue是什么意思 网络流行语yue了是什么梗 2020-10-11
背刺什么意思 网络词语背刺是什么梗
背刺什么意思 网络词语背刺是什么梗 2020-05-22
苹果12mini价格表官网报价 iPhone12mini全版本价格汇总
苹果12mini价格表官网报价 iPhone12mini全版本价格汇总 2020-11-13
2021德云社封箱演出完整版 2021年德云社封箱演出在线看
2021德云社封箱演出完整版 2021年德云社封箱演出在线看 2021-03-15
返回顶部