python批量实现Word文件转换为PDF文件

2025-12-26 07:35:14

本文为大家分享了python批量转换Word文件为PDF文件的具体方法，供大家参考，具体内容如下

1、目的

通过万能的Python把一个目录下的所有Word文件转换为PDF文件。

2、遍历目录

作者总结了三种遍历目录的方法，分别如下。

2.1.调用glob

遍历指定目录下的所有文件和文件夹，不递归遍历，需要手动完成递归遍历功能。

import glob as gb
path = gb.glob('d:\\2\\*')
for path in path:
 print path

2.2.调用os.walk

遍历指定目录下的所有文件和文件夹，递归遍历，功能强大，推荐使用。

import os
for dirpath, dirnames, filenames in os.walk('d:\\2\\'):
 for file in filenames:
  fullpath = os.path.join(dirpath, file)
  print fullpath, file

2.3.自己DIY

遍历指定目录下的所有文件和文件夹，递归遍历，自主编写，扩展性强，可以学习练手。

import os;
files = list();
def DirAll(pathName):
 if os.path.exists(pathName):
  fileList = os.listdir(pathName);
  for f in fileList:
   if f=="$RECYCLE.BIN" or f=="System Volume Information":
    continue;
   f=os.path.join(pathName,f);
   if os.path.isdir(f):
    DirAll(f);
   else:
    dirName=os.path.dirname(f);
    baseName=os.path.basename(f);
    if dirName.endswith(os.sep):
     files.append(dirName+baseName);
    else:
     files.append(dirName+os.sep+baseName); 

DirAll("D:\\2\\");
for f in files:
 print f
 # print f.decode('gbk').encode('utf-8');

2.4.备注

注意，如果遍历过程中，出现文件名称或文件路径乱码问题，可以查看本文的参考资料来解决。

3、转换Word文件为PDF

通过Windows Com组件（win32com），调用Word服务（Word.Application），实现Word到PDF文件的转换。因此，要求该Python程序需要在有Word服务（可能至少要求2007版本）的Windows机器上运行。

#coding:utf8

import os, sys
reload(sys)
sys.setdefaultencoding('utf8')

from win32com.client import Dispatch, constants, gencache

input = 'D:\\2\\test\\11.docx'
output = 'D:\\2\\test\\22.pdf'

print 'input file', input
print 'output file', output
# enable python COM support for Word 2007
# this is generated by: makepy.py -i "Microsoft Word 12.0 Object Library"
gencache.EnsureModule('{00020905-0000-0000-C000-000000000046}', 0, 8, 4)
# 开始转换
w = Dispatch("Word.Application")
try:
 doc = w.Documents.Open(input, ReadOnly=1)
 doc.ExportAsFixedFormat(output, constants.wdExportFormatPDF, \
       Item=constants.wdExportDocumentWithMarkup,
       CreateBookmarks=constants.wdExportCreateHeadingBookmarks)
except:
 print ' exception'
finally:
 w.Quit(constants.wdDoNotSaveChanges)

if os.path.isfile(output):
 print 'translate success'
else:
 print 'translate fail'

4、批量转换

要实现批量准换，将第2步和第3步的功能组合在一起即可，直接上代码。

# -*- coding:utf-8 -*-
# doc2pdf.py: python script to convert doc to pdf with bookmarks!
# Requires Office 2007 SP2
# Requires python for win32 extension

import glob as gb
import sys
reload(sys)
sys.setdefaultencoding('utf8')

'''
参考:http://blog.csdn.net/rumswell/article/details/7434302
'''
import sys, os
from win32com.client import Dispatch, constants, gencache

# from config import REPORT_DOC_PATH,REPORT_PDF_PATH
REPORT_DOC_PATH = 'D:/2/doc/'
REPORT_PDF_PATH = 'D:/2/doc/'

# Word转换为PDF
def word2pdf(filename):
 input = filename + '.docx'
 output = filename + '.pdf'
 pdf_name = output

 # 判断文件是否存在
 os.chdir(REPORT_DOC_PATH)
 if not os.path.isfile(input):
  print u'%s not exist' % input
  return False
 # 文档路径需要为绝对路径，因为Word启动后当前路径不是调用脚本时的当前路径。
 if (not os.path.isabs(input)): # 判断是否为绝对路径
  # os.chdir(REPORT_DOC_PATH)
  input = os.path.abspath(input) # 返回绝对路径
 else:
  print u'%s not absolute path' % input
  return False

 if (not os.path.isabs(output)):
  os.chdir(REPORT_PDF_PATH)
  output = os.path.abspath(output)
 else:
  print u'%s not absolute path' % output
  return False

 try:
  print input, output
  # enable python COM support for Word 2007
  # this is generated by: makepy.py -i "Microsoft Word 12.0 Object Library"
  gencache.EnsureModule('{00020905-0000-0000-C000-000000000046}', 0, 8, 4)
  # 开始转换
  w = Dispatch("Word.Application")
  try:
   doc = w.Documents.Open(input, ReadOnly=1)
   doc.ExportAsFixedFormat(output, constants.wdExportFormatPDF, \
         Item=constants.wdExportDocumentWithMarkup,
         CreateBookmarks=constants.wdExportCreateHeadingBookmarks)
  except:
   print ' exception'
  finally:
   w.Quit(constants.wdDoNotSaveChanges)

  if os.path.isfile(pdf_name):
   print 'translate success'
   return True
  else:
   print 'translate fail'
   return False
 except:
  print ' exception'
  return -1

if __name__ == '__main__':
 # img_path = gb.glob(REPORT_DOC_PATH + "*")
 # for path in img_path:
 #  print path
 #  rc = word2pdf(path)

 # rc = word2pdf('1')
 # print rc,
 # if rc:
 #  sys.exit(rc)
 # sys.exit(0)

 import os
 for dirpath, dirnames, filenames in os.walk(REPORT_DOC_PATH):
  for file in filenames:
   fullpath = os.path.join(dirpath, file)
   print fullpath, file
   rc = word2pdf(file.rstrip('.docx'))

5、参考资料

利用Python将word 2007的文档转为pdf文件

遍历某目录下的所有文件夹与文件的路径、输出中文乱码问题

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持我们。

您可能感兴趣的文章:

python实现word 2007文档转换为pdf文件
利用python程序生成word和PDF文档的方法
批量将ppt转换为pdf的Python代码只要27行!
Python实现将DOC文档转换为PDF的方法
python使用reportlab实现图片转换成pdf的方法
Python中使用PyQt把网页转换成PDF操作代码实例

Python实现将DOC文档转换为PDF的方法

本文实例讲述了Python实现将DOC文档转换为PDF的方法.分享给大家供大家参考.具体实现方法如下: import sys, os from win32com.client import Dispatch, constants, gencache def usage(): sys.stderr.write ("doc2pdf.py input [output]") sys.exit(2) def doc2pdf(input, output): w = Dispatch("W
python使用reportlab实现图片转换成pdf的方法

本文实例讲述了python使用reportlab实现图片转换成pdf的方法.分享给大家供大家参考.具体实现方法如下: #!/usr/bin/env python import os import sys from reportlab.lib.pagesizes import A4, landscape from reportlab.pdfgen import canvas f = sys.argv[1] filename = ''.join(f.split('/')[-1:])[:-4] f_j
批量将ppt转换为pdf的Python代码只要27行!

这是一个Python脚本,能够批量地将微软Powerpoint文件(.ppt或者.pptx)转换为pdf格式. 使用说明 1.将这个脚本跟PPT文件放置在同一个文件夹下. 2.运行这个脚本. 全部代码 import comtypes.client import os def init_powerpoint(): powerpoint = comtypes.client.CreateObject("Powerpoint.Application") powerpoint.Visible =
利用python程序生成word和PDF文档的方法

一.程序导出word文档的方法将web/html内容导出为world文档,再java中有很多解决方案,比如使用Jacob.Apache POI.Java2Word.iText等各种方式,以及使用freemarker这样的模板引擎这样的方式.php中也有一些相应的方法,但在python中将web/html内容生成world文档的方法是很少的.其中最不好解决的就是如何将使用js代码异步获取填充的数据,图片导出到word文档中. 1. unoconv 功能: 1.支持将本地html文档转换为docx
Python中使用PyQt把网页转换成PDF操作代码实例

代码很简单,功能也很简单 =w= webpage2pdf #!/usr/bin/env python3 import sys try: from PyQt4 import QtWebKit from PyQt4.QtCore import QUrl from PyQt4.QtGui import QApplication, QPrinter except ImportError: from PySide import QtWebKit from PySide.QtCore import QUrl
python实现word 2007文档转换为pdf文件

在开发过程中,会遇到在命令行下将DOC文档(或者是其他Office文档)转换为PDF的要求.比如在项目中如果手册是DOC格式的,在项目发布时希望将其转换为PDF格式,并且保留DOC中的书签,链接等.将该过程整合到构建过程中就要求命令行下进行转换. Michael Suodenjoki展示了使用Office的COM接口进行命令行下的转换.但其导出的PDF文档没有书签.在Office 2007 SP2中,微软加入了该功能,对应的接口是ExportAsFixedFormat.该方法不仅适用于Word,
python批量实现Word文件转换为PDF文件

本文为大家分享了python批量转换Word文件为PDF文件的具体方法,供大家参考,具体内容如下 1.目的通过万能的Python把一个目录下的所有Word文件转换为PDF文件. 2.遍历目录作者总结了三种遍历目录的方法,分别如下. 2.1.调用glob 遍历指定目录下的所有文件和文件夹,不递归遍历,需要手动完成递归遍历功能. import glob as gb path = gb.glob('d:\\2\\*') for path in path: print path 2.2.调用os.w
python 如何将office文件转换为PDF

在平时的工作中,难免需要一些小Tip 来解决工作中遇到的问题,今天的文章给大家安利一个方便快捷的小技巧,将 Office(doc/docx/ppt/pptx/xls/xlsx)文件批量或者单一文件转换为 PDF 文件. 不过在做具体操作之前需要在 PC 安装好 Office,再利用 Python 的 win32com 包来实现 Office 文件的转换操作. 安装 win32com 在实战之前,需要安装 Python 的 win32com,详细安装步骤如下: 使用 pip 命令安装 pip i
php使用Image Magick将PDF文件转换为JPG文件的方法

本文实例讲述了php使用Image Magick将PDF文件转换为JPG文件的方法.分享给大家供大家参考.具体如下: 这是一个非常简单的格式转换代码,可以把.PDF文件转换为.JPG文件,代码要起作用,服务器必须要安装Image Magick 扩展. $pdf_file = './pdf/demo.pdf'; $save_to = './jpg/demo.jpg'; //make sure that apache has permissions to write in this folder!
python利用pandas将excel文件转换为txt文件的方法

python将数据换为txt的方法有很多,可以用xlrd库实现.本人比较懒,不想按太多用的少的插件,利用已有库pandas将excel文件转换为txt文件. 直接上代码: ''' function:将excel文件转换为text author:Nstock date:2018/3/1 ''' import pandas as pd import re import codecs #将excel转化为txt文件 def exceltotxt(excel_dir, txt_dir): with co
Python分割指定页数的pdf文件方法

如下所示: from PyPDF2 import PdfFileWriter, PdfFileReader # 开始页 start_page = 0 # 截止页 end_page = 5 output = PdfFileWriter() pdf_file = PdfFileReader(open("3.pdf", "rb")) pdf_pages_len = pdf_file.getNumPages() # 保存input.pdf中的1-5页到output.pdf
python实现npy格式文件转换为txt文件操作

如下代码会将npy的格式数据读出,并且输出来到控制台: import numpy as np ##设置全部数据,不输出省略号 import sys np.set_printoptions(threshold=sys.maxsize) boxes=np.load('./input_output/boxes.npy') print(boxes) np.savetxt('./input_output/boxes.txt',boxes,fmt='%s',newline='\n') print('----
使用python把json文件转换为csv文件

了解json整体格式这里有一段json格式的文件,存着全球陆地和海洋的每年异常气温(这里只选了一部分):global_temperature.json { "description": { "title": "Global Land and Ocean Temperature Anomalies, January-December", "units": "Degrees Celsius", "b
Python实现网页文件转PDF文件和PNG图片的示例代码

目录一.html网页文件转pdf 二.html网页文件转png 一.html网页文件转pdf #将HTML文件导出为PDF def html_to_pdf(html_path,pdf_path='.\\pdf_new.pdf',html_encoding='UTF-8',path_wkpdf = r'.\Tools\wkhtmltopdf.exe'): ''' 将HTML文件导出为PDF :param html_path:str类型,目标HTML文件的路径,可以是一个路径,也可以是多个路径,以
使用python批量读取word文档并整理关键信息到excel表格的实例

目标最近实验室里成立了一个计算机兴趣小组倡议大家多把自己解决问题的经验记录并分享就像在CSDN写博客一样虽然刚刚起步但考虑到后面此类经验记录的资料会越来越多所以一开始就要做好模板设计(如下所示) 方便后面建立电子数据库从而使得其他人可以迅速地搜索到相关记录据说"人生苦短,我用python" 所以决定用python从docx文档中提取文件头的信息然后把信息更新到一个xls电子表格中,像下面这样(直接po结果好了) 而且点击文件路径可以直接打开对应的文件(含超链接) 代码

python批量实现Word文件转换为PDF文件

您可能感兴趣的文章:

相关推荐

随机推荐