Python解析网页源代码中的115网盘链接实例

2025-09-18 22:04:37

本文实例讲述了python解析网页源代码中的115网盘链接的方法。分享给大家供大家参考。具体方法分析如下：

其中的1.txt，是网页http://bbs.pediy.com/showthread.php?t=144788另存为1.txt

具体代码如下：

import re 

if __name__ == "__main__":
  fp = open("c:\\1.txt") 

  https = re.compile(r"(http://u.*)")
  for url in https.findall(fp.read()):
    print url

输出结果：

http://u.115.com/file/f61cb107c8
http://u.115.com/file/f6806f45b8
http://u.115.com/file/f6ec42d4d3
http://u.115.com/file/f6deb05ec4
http://u.115.com/file/f6e51f6838
http://u.115.com/file/f66edaf8d3
http://u.115.com/file/f6d07e07b9
http://u.115.com/file/f6d7f585a8
http://u.115.com/file/f639d8b3cf
http://u.115.com/file/f6dcadbde6
http://u.115.com/file/f6ea3f01c1
http://u.115.com/file/f65b96a06f
http://u.115.com/file/f682da085a
http://u.115.com/file/f6486e698
http://u.115.com/file/f6b7491d9f
http://u.115.com/file/f622b7f9a7
http://u.115.com/file/f64e2424b9
http://u.115.com/file/f6e5132d4d
http://u.115.com/file/f655c10e86
http://u.115.com/file/f6b22e64e6
http://u.115.com/file/f6812126a4
http://u.115.com/file/f6523e625c
http://u.115.com/file/f63e0ccb28
http://u.115.com/file/f611e07b8a#
http://u.115.com/file/f6e047bccc#
http://u.115.com/file/f6d348d781#
http://u.115.com/file/f6ada24153#
http://u.115.com/file/f64f97518b#
http://u.115.com/file/f6f9ba96f8#
http://u.115.com/file/f650e06f38#
http://u.115.com/file/f683ee5b2a#
http://u.115.com/file/f69009bfc2#
http://u.115.com/file/f6ea427646#
http://u.115.com/file/f6acdc6b7f#
http://u.115.com/file/f6c85745d0#
http://u.115.com/file/f61a26cf12#
http://u.115.com/file/f631edf5c6#
http://u.115.com/file/f6b0fa6fb8#
http://u.115.com/file/f6f5fe8962#
http://u.115.com/file/f6bf975e0#
http://u.115.com/file/f6d522784c#
http://u.115.com/file/f6b5ac9991#
http://u.115.com/file/f62e80ced5#
http://u.115.com/file/f6bff09c0c#
http://u.115.com/file/f663fc4a54#
http://u.115.com/file/blpk4pv1
http://u.115.com/file/c4rjotdz
http://u.115.com/file/f6a960aca8#
http://u.115.com/file/efnn38jr
http://u.115.com/file/c4leomjd
http://u.115.com/file/dlpw9s6i
http://u.115.com/file/f6d3cbebe0#
http://u.115.com/file/f6de8062b2#
http://u.115.com/file/ef8og8la
http://u.115.com/file/f6f6391ac6#
http://u.115.com/file/f628d256ae#
http://u.115.com/file/f66a049dc9#
http://u.115.com/file/f62bf1750a#
http://u.115.com/file/f642e47260#
http://u.115.com/file/f693eb7c89#
http://u.115.com/file/f6ed68ba9b#
http://u.115.com/file/f6f099c3f9#
http://u.115.com/file/f61ac19339#
http://u.115.com/file/f6f3c78d2c#
http://u.115.com/file/f6696f6348#
http://u.115.com/file/f6e88eeefb#
http://u.115.com/file/f66471e4eb#
http://u.115.com/file/f672da54ae#
http://u.115.com/file/dnasw0kp#
http://u.115.com/file/dnagnndx#
http://u.115.com/file/clwr2xxg#
http://u.115.com/file/bhbcnnwe#
http://u.115.com/file/aq2rp9ga#
http://u.115.com/file/e601turs#
http://u.115.com/file/dn46qs7x#
http://u.115.com/file/clwonrwg#
http://u.115.com/file/dn43i7jf#
http://u.115.com/file/bhbgrnfz#
http://u.115.com/file/dnsl0kxp#

希望本文所述对大家的Python程序设计有所帮助

python3实现抓取网页资源的 N 种方法

这两天学习了python3实现抓取网页资源的方法,发现了很多种方法,所以,今天添加一点小笔记. 1.最简单 import urllib.request response = urllib.request.urlopen('http://python.org/') html = response.read() 2.使用 Request import urllib.request req = urllib.request.Request('http://python.org/') response
Python天气预报采集器实现代码(网页爬虫)

爬虫简单说来包括两个步骤:获得网页文本.过滤得到数据. 1.获得html文本. python在获取html方面十分方便,寥寥数行代码就可以实现我们需要的功能. 复制代码代码如下: def getHtml(url): page = urllib.urlopen(url) html = page.read() page.close() return html 这么几行代码相信不用注释都能大概知道它的意思. 2.根据正则表达式等获得需要的内容. 使用正则表达式时需要仔细观察该网页信息的结构,并写出正
python 获取网页编码方式实现代码

python 获取网页编码方式实现代码 <span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);"> </span><span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">
Python urllib、urllib2、httplib抓取网页代码实例

使用urllib2,太强大了试了下用代理登陆拉取cookie,跳转抓图片...... 文档:http://docs.python.org/library/urllib2.html 直接上demo代码了包括:直接拉取,使用Reuqest(post/get),使用代理,cookie,跳转处理 #!/usr/bin/python # -*- coding:utf-8 -*- # urllib2_test.py # author: wklken # 2012-03-17 wklken@yeah.ne
Python使用正则表达式抓取网页图片的方法示例

本文实例讲述了Python使用正则表达式抓取网页图片的方法.分享给大家供大家参考,具体如下: #!/usr/bin/python import re import urllib #获取网页信息 def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def getImg(html): #匹配网页中的图片 reg = r'src="(.*?\.jpg)" alt' imgre = re.com
深度剖析使用python抓取网页正文的源码

本方法是基于文本密度的方法,最初的想法来源于哈工大的<基于行块分布函数的通用网页正文抽取算法>,本文基于此进行一些小修改. 约定: 本文基于网页的不同行来进行统计,因此,假设网页内容是没有经过压缩的,就是网页有正常的换行的. 有些新闻网页,可能新闻的文本内容比较短,但其中嵌入一个视频文件,因此,我会给予视频较高的权重:这同样适用于图片,这里有一个不足,应该是要根据图片显示的大小来决定权重的,但本文的方法未能实现这一点. 由于广告,导航这些非正文内容通常以超链接的方式出现,因此文本将
Python实现的下载网页源码功能示例

本文实例讲述了Python实现的下载网页源码功能.分享给大家供大家参考,具体如下: #!/usr/bin/python import httplib httpconn = httplib.HTTPConnection("www.baidu.com") httpconn.request("GET", "/index.html") resp = httpconn.getresponse() if resp.reason == "OK&quo
Python正则抓取网易新闻的方法示例

本文实例讲述了Python正则抓取网易新闻的方法.分享给大家供大家参考,具体如下: 自己写了些关于抓取网易新闻的爬虫,发现其网页源代码与网页的评论根本就对不上,所以,采用了抓包工具得到了其评论的隐藏地址(每个浏览器都有自己的抓包工具,都可以用来分析网站) 如果仔细观察的话就会发现,有一个特殊的,那么这个就是自己想要的了然后打开链接就可以找到相关的评论内容了.(下图为第一页内容) 接下来就是代码了(也照着大神的改改写写了). #coding=utf-8 import urllib2 import
python访问抓取网页常用命令总结

python访问抓取网页常用命令简单的抓取网页: import urllib.request url="http://google.cn/" response=urllib.request.urlopen(url) #返回文件对象 page=response.read() 直接将URL保存为本地文件: import urllib.request url="http://google.cn/" response=urllib.request.urlopen(url)
Python实现多线程抓取网页功能实例详解

本文实例讲述了Python实现多线程抓取网页功能.分享给大家供大家参考,具体如下: 最近,一直在做网络爬虫相关的东西. 看了一下开源C++写的larbin爬虫,仔细阅读了里面的设计思想和一些关键技术的实现. 1.larbin的URL去重用的很高效的bloom filter算法: 2.DNS处理,使用的adns异步的开源组件: 3.对于url队列的处理,则是用部分缓存到内存,部分写入文件的策略. 4.larbin对文件的相关操作做了很多工作 5.在larbin里有连接池,通过创建套接字,向目标站点
Python3使用requests包抓取并保存网页源码的方法

本文实例讲述了Python3使用requests包抓取并保存网页源码的方法.分享给大家供大家参考,具体如下: 使用Python 3的requests模块抓取网页源码并保存到文件示例: import requests html = requests.get("http://www.baidu.com") with open('test.txt','w',encoding='utf-8') as f: f.write(html.text) 这是一个基本的文件保存操作,但这里有几个值得注意的
Python正则抓取新闻标题和链接的方法示例

本文实例讲述了Python正则抓取新闻标题和链接的方法.分享给大家供大家参考,具体如下: #-*-coding:utf-8-*- import re from urllib import urlretrieve from urllib import urlopen #获取网页信息 doc = urlopen("http://www.itongji.cn/news/").read() #自己找的一个大数据的新闻网站 #抓取新闻标题和链接 def extract_title(info):

Python解析网页源代码中的115网盘链接实例

相关推荐

随机推荐