Windows8下安装Python的BeautifulSoup

2025-04-03 15:15:37

运行环境：Windows 8.1
Python：2.7.6

在安装的时候，我使用的pip来进行安装,命令如下：

代码如下:

pip install beautifulsoup4

运行的时候，报错如下：

代码如下:

Exception:
Traceback (most recent call last):
File "J:\Program Files (x86)\Python\Python27\lib\site-packages\pip\basecomm
.py", line 122, in main
    status = self.run(options, args)
File "J:\Program Files (x86)\Python\Python27\lib\site-packages\pip\commands
stall.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bu
e=self.bundle)
File "J:\Program Files (x86)\Python\Python27\lib\site-packages\pip\req.py",
ne 1229, in prepare_files
    req_to_install.run_egg_info()
File "J:\Program Files (x86)\Python\Python27\lib\site-packages\pip\req.py",
ne 292, in run_egg_info
    logger.notify('Running setup.py (path:%s) egg_info for package %s' % (sel
etup_py, self.name))
File "J:\Program Files (x86)\Python\Python27\lib\site-packages\pip\req.py",
ne 265, in setup_py
    import setuptools
File "build\bdist.win-amd64\egg\setuptools\__init__.py", line 11, in <modul
    from setuptools.extension import Extension
File "build\bdist.win-amd64\egg\setuptools\extension.py", line 5, in <modul
File "build\bdist.win-amd64\egg\setuptools\dist.py", line 15, in <module>
File "build\bdist.win-amd64\egg\setuptools\compat.py", line 19, in <module>
File "J:\Program Files (x86)\Python\Python27\lib\SimpleHTTPServer.py", line
, in <module>
    class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
File "J:\Program Files (x86)\Python\Python27\lib\SimpleHTTPServer.py", line
8, in SimpleHTTPRequestHandler
    mimetypes.init() # try to read system mime.types
File "J:\Program Files (x86)\Python\Python27\lib\mimetypes.py", line 358, i
nit
    db.read_windows_registry()
File "J:\Program Files (x86)\Python\Python27\lib\mimetypes.py", line 258, i
ead_windows_registry
    for subkeyname in enum_types(hkcr):
File "J:\Program Files (x86)\Python\Python27\lib\mimetypes.py", line 249, i
num_types
    ctype = ctype.encode(default_encoding) # omit in 3.x!
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 1: ordin
not in range(128)

Storing debug log for failure in C:\Users\Administrator\pip\pip.log

解决方法：打开C:\Python27\Lib下的 mimetypes.py 文件，找到大概256行的

代码如下:

default_encoding = sys.getdefaultencoding()

改成：

代码如下:

if sys.getdefaultencoding() != 'gbk':
reload(sys)
sys.setdefaultencoding('gbk')
default_encoding = sys.getdefaultencoding()

安装成功后，验证是否安装成功：

代码如下:

C:\Users\Administrator>python
Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on 32
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> exit()

如果“from bs4 import BeautifulSoup”没有报错的话，则说明安装成功，否则，将报类似错误如下：

代码如下:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named bs4

python利用beautifulSoup实现爬虫

以前讲过利用phantomjs做爬虫抓网页 http://www.jb51.net/article/55789.htm 是配合选择器做的利用 beautifulSoup(文档 :http://www.crummy.com/software/BeautifulSoup/bs4/doc/)这个python模块,可以很轻松的抓取网页内容 # coding=utf-8 import urllib from bs4 import BeautifulSoup url ='http://www.baidu.
Python使用BeautifulSoup库解析HTML基本使用教程

BeautifulSoup是Python的一个第三方库,可用于帮助解析html/XML等内容,以抓取特定的网页信息.目前最新的是v4版本,这里主要总结一下我使用的v3版本解析html的一些常用方法. 准备 1.Beautiful Soup安装为了能够对页面中的内容进行解析,本文使用Beautiful Soup.当然,本文的例子需求较简单,完全可以使用分析字符串的方式. 执行 sudo easy_install beautifulsoup4 即可安装. 2.requests模块的安装 reque
python 解析html之BeautifulSoup

复制代码代码如下: # coding=utf-8 from BeautifulSoup import BeautifulSoup, Tag, NavigableString from SentenceSpliter import SentenceSpliter from os.path import basename,dirname,isdir,isfile from os import makedirs from shutil import copyfile import io import
python BeautifulSoup使用方法详解

直接看例子: 复制代码代码如下: #!/usr/bin/python# -*- coding: utf-8 -*-from bs4 import BeautifulSouphtml_doc = """<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>T
python基于BeautifulSoup实现抓取网页指定内容的方法

本文实例讲述了python基于BeautifulSoup实现抓取网页指定内容的方法.分享给大家供大家参考.具体实现方法如下: # _*_ coding:utf-8 _*_ #xiaohei.python.seo.call.me:) #win+python2.7.x import urllib2 from bs4 import BeautifulSoup def jd(url): page = urllib2.urlopen(url) html_doc = page.read() soup = B
Python网页解析利器BeautifulSoup安装使用介绍

python解析网页,无出BeautifulSoup左右,此是序言安装 BeautifulSoup4以后的安装需要用eazy_install,如果不需要最新的功能,安装版本3就够了,千万别以为老版本就怎么怎么不好,想当初也是千万人在用的啊.安装很简单复制代码代码如下: $ wget "http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.2.1.tar.gz" $ tar zxvf B
python爬虫入门教程--HTML文本的解析库BeautifulSoup（四）

前言 python爬虫系列文章的第3篇介绍了网络请求库神器 Requests ,请求把数据返回来之后就要提取目标数据,不同的网站返回的内容通常有多种不同的格式,一种是 json 格式,这类数据对开发者来说最友好.另一种 XML 格式的,还有一种最常见格式的是 HTML 文档,今天就来讲讲如何从 HTML 中提取出感兴趣的数据自己写个 HTML 解析器来解析吗?还是用正则表达式?这些都不是最好的办法,好在,Python 社区在这方便早就有了很成熟的方案,BeautifulSoup 就是这一类问题
python使用beautifulsoup从爱奇艺网抓取视频播放

复制代码代码如下: import sysimport urllibfrom urllib import requestimport osfrom bs4 import BeautifulSoup class DramaItem: def __init__(self, num, title, url): self.num = num self.title = title self.url = url def __str__(self):
Python BeautifulSoup中文乱码问题的2种解决方法

解决方法一: 使用python的BeautifulSoup来抓取网页然后输出网页标题,但是输出的总是乱码,找了好久找到解决办法,下面分享给大家首先是代码复制代码代码如下: from bs4 import BeautifulSoupimport urllib2 url = 'http://www.jb51.net/'page = urllib2.urlopen(url) soup = BeautifulSoup(page,from_encoding="utf8")print soup
python使用BeautifulSoup分页网页中超链接的方法

本文实例讲述了python使用BeautifulSoup分页网页中超链接的方法.分享给大家供大家参考.具体如下: python通过BeautifulSoup分页网页中的超级链接,这段python代码输出www.jb51.net主页上所有包含了jb51的url链接 from BeautifulSoup import BeautifulSoup import urllib2 import re url = urllib2.urlopen("http://www.jb51.net") con

Windows8下安装Python的BeautifulSoup

相关推荐

随机推荐