作者:筱杰丶Jevon_879 | 来源:互联网 | 2023-05-18 01:32
IvebeenusingJsouptoscrapeHTMLdatafromawebsite,butthereisonesectionofXMLinsideaj
I've been using Jsoup to scrape HTML data from a website, but there is one section of XML inside a Javascript tag that I need to get because it has a bunch of URLs I need to pull out and download the images. Here is what it looks like:
我一直在使用Jsoup从网站上抓取HTML数据,但是我需要获取Javascript标签中的一部分XML,因为它有一堆我需要拔出并下载图像的URL。这是它的样子:
var xmlTxt = '';'
That is followed by a whole bunch of Javascript code inside the script tag. What is the best way to extract those URLs from the page if I have a Jsoup Document
? If I can't do it with Jsoup, how can I do it? The problem is that the images are held in a carousel and so the HTML on the page only shows the source for the ones currently displayed in the carousel.
接下来是脚本标记内的一大堆Javascript代码。如果我有一个Jsoup文档,从页面中提取这些URL的最佳方法是什么?如果我不能用Jsoup做,我怎么能这样做?问题是图像保存在轮播中,因此页面上的HTML仅显示当前在轮播中显示的图像的来源。
2 个解决方案