python - BeautifulSoup中的parent使用的一点疑问

 adfa3sd5f6a 发布于 2022-10-27 20:50

在看《Python网络数据采集》一书,里面有这样一段代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html=urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj=BeautifulSoup(html,"html.parser")
print(bsObj.find("img",{"src:":"../img/gifts/img1.jpg"}).parent.previous_siblingget_text())

但是运行后出错,提示:

AttributeError: 'NoneType' object has no attribute 'parent'

爬取得URL对应的网页页面如图所示:

对应的代码如下:






Totally Normal Gifts

Here is a collection of totally normal, totally reasonable gifts that your friends are sure to love! Our collection is hand-curated by well-paid, free-range Tibetan monks.

We haven't figured out how to make online shopping carts yet, but you can send us a check to:
123 Main St.
Abuja, Nigeria
We will then send your totally amazing gift, pronto! Please include an extra $5.00 for gift wrapping.

Item Title Description Cost Image
Vegetable Basket This vegetable basket is the perfect gift for your health conscious (or overweight) friends! Now with super-colorful bell peppers! $15.00
Russian Nesting Dolls Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents! $10,000.52
Fish Painting If something seems fishy about this painting, it's because it's a fish! Also hand-painted by trained monkeys! $10,005.00
Dead Parrot This is an ex-parrot! Or maybe he's only resting? $0.50
Mystery Box If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. Keep your friends guessing! $1.50

个人感觉书中代码逻辑上没有错误。。。但是为什么最后IDE会提示'NoneType' object呢。。。是find函数使用不对么。。。应该怎么正确写这段代码呢?

3 个回答
  • previous_siblingget_text中间少.不说
    对find,keyword式
    bsObj.find("img",src="../img/gifts/img1.jpg").parent.previous_sibling.get_text()
    attr式
    bsObj.find("img",attrs={"src":"../img/gifts/img1.jpg"}).parent.previous_sibling.get_text()

    参考: http://beautifulsoup.readthedocs.io/zh_CN/latest/#find

    2022-11-12 01:42 回答
  • {"src:":"../img/gifts/img1.jpg"}应该改为{"src":"../img/gifts/img1.jpg"},它找的src属性,没有:这个符号的。还有后面应该是.previous_sibling.get_text()吧!认真点...

    2022-11-12 01:42 回答
  • from urllib.request import urlopen
    from bs4 import BeautifulSoup
    
    html=urlopen("http://www.pythonscraping.com/pages/page3.html")
    bsObj=BeautifulSoup(html,"html.parser")      
    print(bsObj.find("img",{"src":"../img/gifts/img1.jpg"}).parent.previous_sibling.get_text())

    你的代碼... 也太多錯...

    1. "src:" 應改為 "src"

    2. parent.previous_siblingget_text() 應改為 parent.previous_sibling.get_text()


    我回答過的問題: Python-QA

    2022-11-12 01:42 回答
撰写答案
今天,你开发时遇到什么问题呢?
立即提问
热门标签
PHP1.CN | 中国最专业的PHP中文社区 | PNG素材下载 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有