java调用openoffice将office系列文档转换为PDF的示例方法

2024-11-29 08:55:17

前导：

发过程中经常会使用java将office系列文档转换为PDF，一般都使用微软提供的openoffice+jodconverter 实现转换文档。

openoffice既有windows版本也有linux版。不用担心生产环境是linux系统。

1、openoffice依赖jar，以maven为例：

<dependency>
      <groupId>com.artofsolving</groupId>
      <artifactId>jodconverter</artifactId>
      <version>2.2.1</version>
    </dependency>
    <dependency>
      <groupId>org.openoffice</groupId>
      <artifactId>jurt</artifactId>
      <version>3.0.1</version>
    </dependency>
    <dependency>
      <groupId>org.openoffice</groupId>
      <artifactId>ridl</artifactId>
      <version>3.0.1</version>
    </dependency>
    <dependency>
      <groupId>org.openoffice</groupId>
      <artifactId>juh</artifactId>
      <version>3.0.1</version>
    </dependency>
    <dependency>
      <groupId>org.openoffice</groupId>
      <artifactId>unoil</artifactId>
      <version>3.0.1</version>
    </dependency> 

    <!--jodconverter2.2.1必须依赖slf4j-jdk14必须这个版本，不然源码中日志会报错，很low的一个问题-->
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-jdk14</artifactId>
      <version>1.4.3</version>
    </dependency>

2、直接上转换代码，需要监听openoffice应用程序8100端口即可。

public void convert(File sourceFile, File targetFile) { 

  try {
    // 1: 打开连接
    OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
    connection.connect(); 

    DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
    // 2:获取Format
    DocumentFormatRegistry factory = new BasicDocumentFormatRegistry();
    DocumentFormat inputDocumentFormat = factory
        .getFormatByFileExtension(getExtensionName(sourceFile.getAbsolutePath()));
    DocumentFormat outputDocumentFormat = factory
        .getFormatByFileExtension(getExtensionName(targetFile.getAbsolutePath()));
    // 3:执行转换
    converter.convert(sourceFile, inputDocumentFormat, targetFile, outputDocumentFormat);
  } catch (ConnectException e) {
    log.info("文档转换PDF失败");
  }
}

3、需注意：jodconverter 在转换2007版本以后的xxx.docx文档会报错，原因大家都明03后缀名xxx.doc 07以后版本xxx.docx

查看jodconverter源码发现documentFormat不支持xxx.docx格式BasicDocumentFormatRegistry中public DocumentFormat getFormatByFileExtension(String extension)默认支持是使用doc格式

BasicDocumentFormatRegistry类源码

//
// JODConverter - Java OpenDocument Converter
// Copyright (C) 2004-2007 - Mirko Nasato <mirko@artofsolving.com>
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
// http://www.gnu.org/copyleft/lesser.html
//
package com.artofsolving.jodconverter; 

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List; 

public class BasicDocumentFormatRegistry implements DocumentFormatRegistry { 

  private List/*<DocumentFormat>*/ documentFormats = new ArrayList(); 

  public void addDocumentFormat(DocumentFormat documentFormat) {
    documentFormats.add(documentFormat);
  } 

  protected List/*<DocumentFormat>*/ getDocumentFormats() {
    return documentFormats;
  } 

  /**
   * @param extension the file extension
   * @return the DocumentFormat for this extension, or null if the extension is not mapped
   */
  public DocumentFormat getFormatByFileExtension(String extension) {
    if (extension == null) {
      return null;
    }
    String lowerExtension = extension.toLowerCase();
    for (Iterator it = documentFormats.iterator(); it.hasNext();) {
      DocumentFormat format = (DocumentFormat) it.next();
      if (format.getFileExtension().equals(lowerExtension)) {
        return format;
      }
    }
    return null;
  } 

  public DocumentFormat getFormatByMimeType(String mimeType) {
    for (Iterator it = documentFormats.iterator(); it.hasNext();) {
      DocumentFormat format = (DocumentFormat) it.next();
      if (format.getMimeType().equals(mimeType)) {
        return format;
      }
    }
    return null;
  }
}

BasicDocumentFormatRegistry的默认实现类DefaultDocumentFormatRegistry 中支持的文件格式如下

//
// JODConverter - Java OpenDocument Converter
// Copyright (C) 2004-2007 - Mirko Nasato <mirko@artofsolving.com>
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
// http://www.gnu.org/copyleft/lesser.html
//
package com.artofsolving.jodconverter; 

public class DefaultDocumentFormatRegistry extends BasicDocumentFormatRegistry { 

  public DefaultDocumentFormatRegistry() {
    final DocumentFormat pdf = new DocumentFormat("Portable Document Format", "application/pdf", "pdf");
    pdf.setExportFilter(DocumentFamily.DRAWING, "draw_pdf_Export");
    pdf.setExportFilter(DocumentFamily.PRESENTATION, "impress_pdf_Export");
    pdf.setExportFilter(DocumentFamily.SPREADSHEET, "calc_pdf_Export");
    pdf.setExportFilter(DocumentFamily.TEXT, "writer_pdf_Export");
    addDocumentFormat(pdf); 

    final DocumentFormat swf = new DocumentFormat("Macromedia Flash", "application/x-shockwave-flash", "swf");
    swf.setExportFilter(DocumentFamily.DRAWING, "draw_flash_Export");
    swf.setExportFilter(DocumentFamily.PRESENTATION, "impress_flash_Export");
    addDocumentFormat(swf); 

    final DocumentFormat xhtml = new DocumentFormat("XHTML", "application/xhtml+xml", "xhtml");
    xhtml.setExportFilter(DocumentFamily.PRESENTATION, "XHTML Impress File");
    xhtml.setExportFilter(DocumentFamily.SPREADSHEET, "XHTML Calc File");
    xhtml.setExportFilter(DocumentFamily.TEXT, "XHTML Writer File");
    addDocumentFormat(xhtml); 

    // HTML is treated as Text when supplied as input, but as an output it is also
    // available for exporting Spreadsheet and Presentation formats
    final DocumentFormat html = new DocumentFormat("HTML", DocumentFamily.TEXT, "text/html", "html");
    html.setExportFilter(DocumentFamily.PRESENTATION, "impress_html_Export");
    html.setExportFilter(DocumentFamily.SPREADSHEET, "HTML (StarCalc)");
    html.setExportFilter(DocumentFamily.TEXT, "HTML (StarWriter)");
    addDocumentFormat(html); 

    final DocumentFormat odt = new DocumentFormat("OpenDocument Text", DocumentFamily.TEXT, "application/vnd.oasis.opendocument.text", "odt");
    odt.setExportFilter(DocumentFamily.TEXT, "writer8");
    addDocumentFormat(odt); 

    final DocumentFormat sxw = new DocumentFormat("OpenOffice.org 1.0 Text Document", DocumentFamily.TEXT, "application/vnd.sun.xml.writer", "sxw");
    sxw.setExportFilter(DocumentFamily.TEXT, "StarOffice XML (Writer)");
    addDocumentFormat(sxw); 

    final DocumentFormat doc = new DocumentFormat("Microsoft Word", DocumentFamily.TEXT, "application/msword", "doc");
    doc.setExportFilter(DocumentFamily.TEXT, "MS Word 97");
    addDocumentFormat(doc); 

    final DocumentFormat rtf = new DocumentFormat("Rich Text Format", DocumentFamily.TEXT, "text/rtf", "rtf");
    rtf.setExportFilter(DocumentFamily.TEXT, "Rich Text Format");
    addDocumentFormat(rtf); 

    final DocumentFormat wpd = new DocumentFormat("WordPerfect", DocumentFamily.TEXT, "application/wordperfect", "wpd");
    addDocumentFormat(wpd); 

    final DocumentFormat txt = new DocumentFormat("Plain Text", DocumentFamily.TEXT, "text/plain", "txt");
    // set FilterName to "Text" to prevent OOo from tryign to display the "ASCII Filter Options" dialog
    // alternatively FilterName could be "Text (encoded)" and FilterOptions used to set encoding if needed
    txt.setImportOption("FilterName", "Text");
    txt.setExportFilter(DocumentFamily.TEXT, "Text");
    addDocumentFormat(txt); 

    final DocumentFormat wikitext = new DocumentFormat("MediaWiki wikitext", "text/x-wiki", "wiki");
    wikitext.setExportFilter(DocumentFamily.TEXT, "MediaWiki");
    addDocumentFormat(wikitext); 

    final DocumentFormat ods = new DocumentFormat("OpenDocument Spreadsheet", DocumentFamily.SPREADSHEET, "application/vnd.oasis.opendocument.spreadsheet", "ods");
    ods.setExportFilter(DocumentFamily.SPREADSHEET, "calc8");
    addDocumentFormat(ods); 

    final DocumentFormat sxc = new DocumentFormat("OpenOffice.org 1.0 Spreadsheet", DocumentFamily.SPREADSHEET, "application/vnd.sun.xml.calc", "sxc");
    sxc.setExportFilter(DocumentFamily.SPREADSHEET, "StarOffice XML (Calc)");
    addDocumentFormat(sxc); 

    final DocumentFormat xls = new DocumentFormat("Microsoft Excel", DocumentFamily.SPREADSHEET, "application/vnd.ms-excel", "xls");
    xls.setExportFilter(DocumentFamily.SPREADSHEET, "MS Excel 97");
    addDocumentFormat(xls); 

    final DocumentFormat csv = new DocumentFormat("CSV", DocumentFamily.SPREADSHEET, "text/csv", "csv");
    csv.setImportOption("FilterName", "Text - txt - csv (StarCalc)");
    csv.setImportOption("FilterOptions", "44,34,0"); // Field Separator: ','; Text Delimiter: '"'
    csv.setExportFilter(DocumentFamily.SPREADSHEET, "Text - txt - csv (StarCalc)");
    csv.setExportOption(DocumentFamily.SPREADSHEET, "FilterOptions", "44,34,0");
    addDocumentFormat(csv); 

    final DocumentFormat tsv = new DocumentFormat("Tab-separated Values", DocumentFamily.SPREADSHEET, "text/tab-separated-values", "tsv");
    tsv.setImportOption("FilterName", "Text - txt - csv (StarCalc)");
    tsv.setImportOption("FilterOptions", "9,34,0"); // Field Separator: '\t'; Text Delimiter: '"'
    tsv.setExportFilter(DocumentFamily.SPREADSHEET, "Text - txt - csv (StarCalc)");
    tsv.setExportOption(DocumentFamily.SPREADSHEET, "FilterOptions", "9,34,0");
    addDocumentFormat(tsv); 

    final DocumentFormat odp = new DocumentFormat("OpenDocument Presentation", DocumentFamily.PRESENTATION, "application/vnd.oasis.opendocument.presentation", "odp");
    odp.setExportFilter(DocumentFamily.PRESENTATION, "impress8");
    addDocumentFormat(odp); 

    final DocumentFormat sxi = new DocumentFormat("OpenOffice.org 1.0 Presentation", DocumentFamily.PRESENTATION, "application/vnd.sun.xml.impress", "sxi");
    sxi.setExportFilter(DocumentFamily.PRESENTATION, "StarOffice XML (Impress)");
    addDocumentFormat(sxi); 

    final DocumentFormat ppt = new DocumentFormat("Microsoft PowerPoint", DocumentFamily.PRESENTATION, "application/vnd.ms-powerpoint", "ppt");
    ppt.setExportFilter(DocumentFamily.PRESENTATION, "MS PowerPoint 97");
    addDocumentFormat(ppt); 

    final DocumentFormat odg = new DocumentFormat("OpenDocument Drawing", DocumentFamily.DRAWING, "application/vnd.oasis.opendocument.graphics", "odg");
    odg.setExportFilter(DocumentFamily.DRAWING, "draw8");
    addDocumentFormat(odg); 

    final DocumentFormat svg = new DocumentFormat("Scalable Vector Graphics", "image/svg+xml", "svg");
    svg.setExportFilter(DocumentFamily.DRAWING, "draw_svg_Export");
    addDocumentFormat(svg);
  }
}

解决方法：重写BasicDocumentFormatRegistry类中public DocumentFormat getFormatByFileExtension(String extension)方法，只要是后缀名包含doc则使用doc的documentFormat文档格式

//
// JODConverter - Java OpenDocument Converter
// Copyright (C) 2004-2007 - Mirko Nasato <mirko@artofsolving.com>
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
// http://www.gnu.org/copyleft/lesser.html
//
package com.artofsolving.jodconverter; 

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List; 

/**
 * 重写 BasicDocumentFormatRegistry 文档格式
 * @author HuGuangJun
 */
public class BasicDocumentFormatRegistry implements DocumentFormatRegistry { 

  private List/* <DocumentFormat> */ documentFormats = new ArrayList(); 

  public void addDocumentFormat(DocumentFormat documentFormat) {
    documentFormats.add(documentFormat);
  } 

  protected List/* <DocumentFormat> */ getDocumentFormats() {
    return documentFormats;
  } 

  /**
   * @param extension
   *      the file extension
   * @return the DocumentFormat for this extension, or null if the extension
   *     is not mapped
   */
  public DocumentFormat getFormatByFileExtension(String extension) {
    if (extension == null) {
      return null;
    }
    //将文件名后缀统一转化
    if (extension.indexOf("doc") >= 0) {
      extension = "doc";
    }
    if (extension.indexOf("ppt") >= 0) {
      extension = "ppt";
    }
    if (extension.indexOf("xls") >= 0) {
      extension = "xls";
    }
    String lowerExtension = extension.toLowerCase();
    for (Iterator it = documentFormats.iterator(); it.hasNext();) {
      DocumentFormat format = (DocumentFormat) it.next();
      if (format.getFileExtension().equals(lowerExtension)) {
        return format;
      }
    }
    return null;
  } 

  public DocumentFormat getFormatByMimeType(String mimeType) {
    for (Iterator it = documentFormats.iterator(); it.hasNext();) {
      DocumentFormat format = (DocumentFormat) it.next();
      if (format.getMimeType().equals(mimeType)) {
        return format;
      }
    }
    return null;
  }
}

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持我们。

解决linux下openoffice word文件转PDF中文乱码的问题

网上很多介绍是由于jdk中的没有字体导致乱码,而我遇到的是转换过程并未报错,但转换后的PDF中是乱码,尝试在jre/lib/fonts/中增加字体,还是不能解决问题,因此可以判断非jre字体问题,是linux系统字体问题. 用vim /etc/fonts/fonts.conf,可以看到系统字体文件在/usr/share/fonts,将windows系统字体文件连接到此目录下 ln -s /usr/local/fonts fonts 然后更新缓存:fc-cache 重启openoffice: /o
Java使用openOffice对于word的转换及遇到的问题解决

一:需求详情: OpenOffice.org 是一套跨平台的办公室软件套件,能在 Windows.Linux.MacOS X (X11).和 Solaris 等操作系统上执行.它与各个主要的办公室软件套件兼容.OpenOffice.org 是自由软件,任何人都可以免费下载.使用.及推广它. 公司需要存储合同文件,用户上传word文档的合同,通过openOffice去把word转换为pdf.再把pdf转换为图片格式,并分别存储.因为openOffice的转换需要耗费挺大的内存,所以设计为task任
利用openoffice+jodconverter-code-3.0-bate4实现ppt转图片

本文实例为大家分享了openoffice+jodconverter-code-3.0-bate4实现ppt转图片的具体代码,供大家参考,具体内容如下安装openoffice4 (用于把文档(ppt)转成pdf)根据系统的位数安装使用jodconverter-core3.0-beta-4(要上传maven本地仓库) 安装ImageMagick:yum install ImageMagick(用于pdf转图片) 安装pdftotext 用于提取文字大纲 yum install popple
PHP调用OpenOffice实现word转PDF的方法

最近一直在研究PHP word文档转PDF,也在网上搜索了很多类似的资料,大多数都是通过OpenOffice进行转换的. 核心的代码如下: function MakePropertyValue($name,$value,$osm){ $oStruct = $osm->Bridge_GetStruct("com.sun.star.beans.PropertyValue"); $oStruct->Name = $name; $oStruct->Value = $value
Java利用openoffice将doc、docx转为pdf实例代码

本文研究的主要是Java编程利用openoffice将doc.docx转为pdf的实现代码,具体如下. 1. 需要用的软件 OpenOffice , JodConverter 2.启动OpenOffice的服务我到网上查如何利用OpenOffice进行转码的时候,都是需要先用cmd启动一个soffice服务,启动的命令是:soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;". 但是实际上,对于我的项目,进行转
java调用openoffice将office系列文档转换为PDF的示例方法

前导: 发过程中经常会使用java将office系列文档转换为PDF, 一般都使用微软提供的openoffice+jodconverter 实现转换文档. openoffice既有windows版本也有linux版.不用担心生产环境是linux系统. 1.openoffice依赖jar,以maven为例: <dependency> <groupId>com.artofsolving</groupId> <artifactId>jodconverter<
Python实现将DOC文档转换为PDF的方法

本文实例讲述了Python实现将DOC文档转换为PDF的方法.分享给大家供大家参考.具体实现方法如下: import sys, os from win32com.client import Dispatch, constants, gencache def usage(): sys.stderr.write ("doc2pdf.py input [output]") sys.exit(2) def doc2pdf(input, output): w = Dispatch("W
python实现word 2007文档转换为pdf文件

在开发过程中,会遇到在命令行下将DOC文档(或者是其他Office文档)转换为PDF的要求.比如在项目中如果手册是DOC格式的,在项目发布时希望将其转换为PDF格式,并且保留DOC中的书签,链接等.将该过程整合到构建过程中就要求命令行下进行转换. Michael Suodenjoki展示了使用Office的COM接口进行命令行下的转换.但其导出的PDF文档没有书签.在Office 2007 SP2中,微软加入了该功能,对应的接口是ExportAsFixedFormat.该方法不仅适用于Word,
SpringBoot如何实现word文档转pdf

目录一.背景二.方案选择 1.Spire.Doc for Java方案 2.docx4j方案 3.jodconverter+LibreOffice 方案 4.其他三.实操 1.docx4j 2.poi-tl+jodconverter+LibreOffice 方案四.结论 1.docx4j方案 2.jodconverter+LibreOffice 方案一.背景项目中有个需求大体意思是,上传一个word模板,根据word模板合成word文件,再将word文件转为pdf. 二.方案选择 1
linux平台的office文档转pdf的实例(程序员的菜)

需要材料: 1. Openoffice3.4(我是32位的centos,可以根据自己的系统下载指定的openoffice软件包) 下载地址:http://sourceforge.net/projects/openofficeorg.mirror/files/stable/3.4.1/Apache_OpenOffice_incubating_3.4.1_Linux_x86_install-rpm_en-US.tar.gz/download 2. jodconverter.2.2.2 下载地址:
Java解析word,获取文档中图片位置的方法

前言(背景介绍): Apache POI是Apache基金会下一个开源的项目,用来处理office系列的文档,能够创建和解析word.excel.ppt格式的文档. 其中对word文档的处理有两个技术,分别是HWPF(.doc)和XWPF(.docx).如果你对这两个技术熟悉的话,就应该能明白使用java解析word文档的痛楚所在. 其中两个最大的问题在于: 第一是这两个类并没有统一的父类和接口(隔壁的XSSF和HSSF投过来鄙视的眼光),所以没法进行同一格式的接口式编程: 第二是官方API中并
java实现word文档转pdf并添加水印的方法详解

本文实例讲述了java实现word文档转pdf并添加水印的方法.分享给大家供大家参考,具体如下: 前段时间,项目需要自动生成word文档,用WordFreeMarker生成word文档后,又要求生成的文档能在浏览器浏览,思来想去,把word文档转成pdf就好了,于是乎研究了一下. 将word文档转化为PDF是项目中常见的需求之一,目前主流的方法可以分为两大类,一类是利用各种Office应用进行转换,譬如Microsoft Office.WPS以及LiberOffice,另一种是利用各种语言提供的
java集成开发SpringBoot生成接口文档示例实现

目录为什么要用Swagger ? Swagger集成第一步: 引入依赖包第二步:修改配置文件第三步,配置API接口 Unable to infer base url For input string: "" Swagger美化第一步: 引入依赖包第二步:启用knife4j增强 Swagger参数分组分组使用说明 1.在bean对象的属性里配置如下注释 2.在接口参数的时候加入组规则校验小结大家好,我是飘渺. SpringBoot老鸟系列的文章已经写了两篇,每篇的阅读反
Java中Aspose组件进行多文档间的转换方法总结

什么是Aspose? Aspose.Total是Aspose公司旗下的最全的一套office文档管理方案,主要提供.net跟java两个开发语言的控件套包,通过它,我们可以有计划地操纵一些商业中最流行的文件格式:Word, Excel, PowerPoint, Project,等office文档以及PDF文档. 除了强大的文件操纵组件之外,Aspose.Total 还提供了用于制图.写电子邮件.拼写检查.创建条形码.生成ad hoc 查询.重现格式以及工作流等组件,运用它我们可以整理一个完整的文
Java使用poi包读取Excel文档代码分享

项目需要解析Excel文档获取数据,就在网上找了一些资料,结合自己这次使用,写下心得: 1.maven项目需加入如下依赖: <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.10-FINAL</version> </dependency> <dependency> <gr

java调用openoffice将office系列文档转换为PDF的示例方法

相关推荐

随机推荐