我正试图从网站获取标题,用JSON编码将其写入文件.我尝试了两种不同的方法但没有成功.
首先使用urllib2和json
import urllib2 import json host = ("https://www.python.org/") header = urllib2.urlopen(host).info() json_header = json.dumps(header) print json_header
这样我得到错误:
TypeError:不是JSON可序列化的
所以我尝试通过将对象转换为字符串来绕过这个问题 - > json_header = str(header)这样我可以json_header = json.dumps(header)但输出却很奇怪:
"日期:星期三,2014年7月2日13:33:37 GMT\r \n服务器:nginx\r \n内容类型:text/html; charset = utf-8\r \nX-Frame-Options:SAMEORIGIN\r \nContent -Length:45682\r \nAccept-Ranges:bytes\r \nVia:1.1 varnish\r \nAge:1263\r \nX-Served-By:cache-fra1220-FRA\r \nX-Cache:HIT\r \nX-Cache-Hits:2\r \nVary:Cookie\r \nStrict-Transport-Security:max-age = 63072000; includeSubDomains\r \nConnection:close\r \n"
第二次请求
import requests r = requests.get(“https://www.python.org/”) rh = r.headers print rh
{'content-length':'45682','via':'1.1 varnish','x-cache':'HIT','accept-ranges':'bytes','strict-transport-security':'max -age = 63072000; includeSubDomains','vary':'Cookie','server':'nginx','x-served-by':'cache-fra1226-FRA','x-cache-hits':'14','date' :'Wed,02 Jul 2014 13:39:33 GMT','x-frame-options':'SAMEORIGIN','content-type':'text/html; charset = utf-8','age':'1619'}
通过这种方式,输出更像JSON,但仍然不行(请参阅''而不是""和其他类似=和;的东西).显然有一些(或很多)我没有以正确的方式做.我试过阅读模块的文档但我无法理解如何解决这个问题.谢谢您的帮助.
如果您只对标题感兴趣,请提出head
请求.转换CaseInsensitiveDict
的dict
对象,然后将其转换为json
.
import requests import json r = requests.head('https://www.python.org/') rh = dict(r.headers) json.dumps(rh)
有多种方法可以对头文件进行编码JSON
,但我首先想到的是将headers
属性转换为实际的字典,而不是将其作为requests.structures.CaseInsensitiveDict
import requests, json r = requests.get("https://www.python.org/") rh = json.dumps(r.headers.__dict__['_store']) print rh
{'content-length':('content-length','45474'),'via':('via','1.1 varnish'),'x-cache':('x-cache','HIT' ),'accept-ranges':('accept-ranges','bytes'),'strict-transport-security':('strict-transport-security','max-age = 63072000; includeSubDomains'),'变化':('vary','Cookie'),'server':('server','nginx'),'x-served-by':('x-served-by','cache-iad2132-IAD' ),'x-cache-hits':('x-cache-hits','1'),'date':('date','Wed,02 Jul 2014 14:13:37 GMT'),'x -frame-options':('x-frame-options','SAMEORIGIN'),'content-type':('content-type','text/html; charset = utf-8'),'age': ('年龄','1483')}
根据您对标题的确切要求,您可以在此之后专门访问它们,但如果格式略有不同,这将为您提供标题中包含的所有信息.
如果您更喜欢不同的格式,还可以将标题转换为字典:
import requests, json r = requests.get("https://www.python.org/") print json.dumps(dict(r.headers))
{"content-length":"45682","via":"1.1 varnish","x-cache":"HIT","accept-ranges":"bytes","strict-transport-security":"max" -age = 63072000; includeSubDomains","vary":"Cookie","server":"nginx","x-served-by":"cache-at50-ATL","x-cache-hits":"5 ","date":"Wed,02 Jul 2014 14:08:15 GMT","x-frame-options":"SAMEORIGIN","content-type":"text/html; charset = utf-8", "年龄":"951"}