2019独角兽企业重金招聘Python工程师标准>>>

requests 1.requests是一个强大的Python第三方Http库&＃xff0c;基于httplib和urllib3&＃xff0c;接口清晰易用&＃xff0c;功能十分强大。

###1. 安装
pip install requests或者easy_install requests

###2. 基本使用
在ipython中利用自动补全看下调用requests之后返回的response对象的一些属性:

In [1]: import requests In [2]: r &＃61; requests.get(&＃39;https://api.github.com&＃39;) In [3]: r. r.apparent_encoding r.history r.raw r.close r.is_redirect r.reason r.connection r.iter_content r.request r.content r.iter_lines r.status_code r.COOKIEs r.json r.text r.elapsed r.links r.url r.encoding r.ok r.headers r.raise_for_status

快速入门&＃xff1a;http://requests-docs-cn.readthedocs.io/zh_CN/latest/user/quickstart.html

高级的用法&＃xff1a;http://requests-docs-cn.readthedocs.io/zh_CN/latest/user/advanced.html

lxml

安装方法&＃xff08;很多同学在安装过程中遇到了问题&＃xff09;&＃xff0c;见我的上一篇博客。

使用requests库获取到网页内容后&＃xff0c;再通过lxml解析&＃xff0c;也可通过BeautifulSoup等等工具

lxml是基于C语言库libxml2和libxslt的python化绑定&＃xff0c;其对XML&＃xff08;HTMl&＃xff09;有强大的处理能力&＃xff0c;并且向下兼容Python的ElementTree API&＃xff0c;支持Xpath和BeautifulSoup解析&＃xff0c; 使用起来非常方便。

官方教程:http://lxml.de/

下面是一个在Windows平台下用python3.5用lxml解析HTML的例子&＃xff0c;lxml通过xpath表达式来获取数据

&＃xff08;详见&＃xff1a;http://www.cnblogs.com/descusr/archive/2012/06/20/2557075.html&＃xff09;&＃xff1a;

from lxml import etreehtml &＃61; &＃39;&＃39;&＃39;

`Top News`

World News only on this page

Ah, and here&＃39;s some more text, by the way.... and this is a parsed fragment ...
青少年发展基金会 洛克王国 奥拉星 手机游戏手机壁纸4399小游戏 91wan游戏
&＃39;&＃39;&＃39;page &＃61; etree.HTML(html.lower())
hrefs &＃61; page.xpath(u"//a")
for href in hrefs:# print(href.attrib)print(href.text)