最近接触了mongodb,处理json和Unicode字符串遇到些问题总结下。
python2环境
mac os系统
# -*- coding: utf-8 -*-import jsonfrom pymongo import MongoClient
from bson import json_util, ObjectIdclient = MongoClient('localhost', 27017)
db = client['tumblelog']
collection = db.user
# user 查询出来是字典类型
user = collection.find_one({"_id": ObjectId('5f3cdfac77f96de114e1e989')})
# 对user利用python原生的json进行序列化
print json.dumps(user)
报出了如下错误信息:
TypeError: ObjectId('5f3cdfac77f96de114e1e989') is not JSON serializable
不能用python原生的json,mongo提供了专门的json_util
# -*- coding: utf-8 -*-from pymongo import MongoClient
from bson import json_util, ObjectIdclient = MongoClient('localhost', 27017)
db = client['tumblelog']
collection = db.user
# user 查询出来是字典类型
user = collection.find_one({"_id": ObjectId('5f3cdfac77f96de114e1e989')})
user_str = json_util.dumps(user)
print (type(user_str), user_str)
成功序列化:
(, '{"last_name": "Lawley6", "first_name": "Ross6", "_id": {"$oid": "5f3cdfac77f96de114e1e989"}, "email": "ross@example.com6"}')
另一个问题:
mongodb查询出的数据都是Unicode类型的,我想要string类型的,这是python2的原因,换成python3的环境没问题。
# -*- coding: utf-8 -*-import jsonfrom pymongo import MongoClient
from bson import json_util, ObjectIdclient = MongoClient('localhost', 27017)
db = client['tumblelog']
collection = db.user
user = collection.find_one({"_id": ObjectId('5f3cdfac77f96de114e1e989')})
print(type(user), user)
输出,可以看出来都是Unicode类型:
(, {u'last_name': u'Lawley6', u'first_name': u'Ross6', u'_id': ObjectId('5f3cdfac77f96de114e1e989'), u'email': u'ross@example.com6'})
如果用不用python3的话,可以通过yaml库的safe_load方法解决
# -*- coding: utf-8 -*-import json
import yamlfrom pymongo import MongoClient
from bson import json_util, ObjectIdclient = MongoClient('localhost', 27017)
db = client['tumblelog']
collection = db.user
user = collection.find_one({"_id": ObjectId('5f3cdfac77f96de114e1e989')})
user_dict_str = yaml.safe_load(json_util.dumps(user))
print (type(user_dict_str), user_dict_str)
通过safe_load方法转化后,可以看出输出类型都是string了。
(, {'first_name': 'Ross6', 'last_name': 'Lawley6', 'email': 'ross@example.com6', '_id': {'$oid': '5f3cdfac77f96de114e1e989'}})
再记录一个python2的json对于中文乱码的处理,默认dump中文会乱码,加一个ensure_ascii参数就好了
import jsonl = ['a', 'b', '中文']
print (l)
l_str = json.dumps(l)
print (l_str)
l_str = json.dumps(l, ensure_ascii=False)
print (l_str)
['a', 'b', '\xe4\xb8\xad\xe6\x96\x87']
["a", "b", "\u4e2d\u6587"]
["a", "b", "中文"]