Use selector-syntax to find elements
Problem
You want to find or manipulate elements using a CSS or jquery-like selector syntax.
Solution
Use the Element.select(String selector)
and Elements.select(String selector)
methods:
File input =newFile("/tmp/input.html");
Document doc =Jsoup.parse(input,"UTF-8","http://example.com/");
Elements links = doc.select("a[href]");// a with href
Elements pngs = doc.select("img[src$=.png]");
// img with src ending .png
Element masthead = doc.select("div.masthead").first();
// div with class=masthead
Elements resultLinks = doc.select("h3.r > a");// direct a after h3
Description
jsoup elements support a CSS (or jquery) like selector syntax to find matching elements, that allows very powerful and robust queries.
The select
method is available in a Document
, Element
, or in Elements
. It is contextual, so you can filter by selecting from a specific element, or by chaining select calls.
Select returns a list of Elements (as Elements
), which provides a range of methods to extract and manipulate the results.
Selector overview
-
tagname
: find elements by tag, e.g.a
-
ns|tag
: find elements by tag in a namespace, e.g.fb|name
finds<fb:name>
elements -
#id
: find elements by ID, e.g.#logo
-
.class
: find elements by class name, e.g..masthead
-
[attribute]
: elements with attribute, e.g.[href]
-
[^attr]
: elements with an attribute name prefix, e.g.[^data-]
finds elements with HTML5 dataset attributes -
[attr=value]
: elements with attribute value, e.g.[width=500]
-
[attr^=value]
,[attr$=value]
,[attr*=value]
: elements with attributes that start with, end with, or contain the value, e.g.[href*=/path/]
-
[attr~=regex]
: elements with attribute values that match the regular expression; e.g.img[src~=(?i)\.(png|jpe?g)]
-
*
: all elements, e.g.*
Selector combinations
-
el#id
: elements with ID, e.g.div#logo
-
el.class
: elements with class, e.g.div.masthead
-
el[attr]
: elements with attribute, e.g.a[href]
- Any combination, e.g.
a[href].highlight
-
ancestor child
: child elements that descend from ancestor, e.g..body p
findsp
elements anywhere under a block with class "body" -
parent > child
: child elements that descend directly from parent, e.g.div.content > p
findsp
elements; andbody > *
finds the direct children of the body tag -
siblingA + siblingB
: finds sibling B element immediately preceded by sibling A, e.g.div.head + div
-
siblingA ~ siblingX
: finds sibling X element preceded by sibling A, e.g.h1 ~ p
-
el, el, el
: group multiple selectors, find unique elements that match any of the selectors; e.g.div.masthead, div.logo
Pseudo selectors
-
:lt(n)
: find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less thann
; e.g.td:lt(3)
-
:gt(n)
: find elements whose sibling index is greater thann
; e.g.div p:gt(2)
-
:eq(n)
: find elements whose sibling index is equal ton
; e.g.form input:eq(1)
-
:has(seletor)
: find elements that contain elements matching the selector; e.g.div:has(p)
-
:not(selector)
: find elements that do not match the selector; e.g.div:not(.logo)
-
:contains(text)
: find elements that contain the given text. The search is case-insensitive; e.g.p:contains(jsoup)
-
:containsOwn(text)
: find elements that directly contain the given text -
:matches(regex)
: find elements whose text matches the specified regular expression; e.g.div:matches((?i)login)
-
:matchesOwn(regex)
: find elements whose own text matches the specified regular expression - Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc
See the Selector
API reference for the full supported list and details.
相关推荐
jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jar,jsoup-1.15.3.jarjsoup-...
赠送jar包:jsoup-1.14.3.jar; 赠送原API文档:jsoup-1.14.3-javadoc.jar; 赠送源代码:jsoup-1.14.3-sources.jar; 赠送Maven依赖信息文件:jsoup-1.14.3.pom; 包含翻译后的API文档:jsoup-1.14.3-javadoc-API...
赠送jar包:jsoup-1.11.3.jar; 赠送原API文档:jsoup-1.11.3-javadoc.jar; 赠送源代码:jsoup-1.11.3-sources.jar; 赠送Maven依赖信息文件:jsoup-1.11.3.pom; 包含翻译后的API文档:jsoup-1.11.3-javadoc-API...
赠送jar包:jsoup-1.14.3.jar; 赠送原API文档:jsoup-1.14.3-javadoc.jar; 赠送源代码:jsoup-1.14.3-sources.jar; 赠送Maven依赖信息文件:jsoup-1.14.3.pom; 包含翻译后的API文档:jsoup-1.14.3-javadoc-API...
jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的... <groupId>org.jsoup <artifactId>jsoup <version>1.6.3 </dependency>
赠送jar包:jsoup-1.10.3.jar; 赠送原API文档:jsoup-1.10.3-javadoc.jar; 赠送源代码:jsoup-1.10.3-sources.jar; 赠送Maven依赖信息文件:jsoup-1.10.3.pom; 包含翻译后的API文档:jsoup-1.10.3-javadoc-API...
赠送jar包:jsoup-1.11.3.jar; 赠送原API文档:jsoup-1.11.3-javadoc.jar; 赠送源代码:jsoup-1.11.3-sources.jar; 赠送Maven依赖信息文件:jsoup-1.11.3.pom; 包含翻译后的API文档:jsoup-1.11.3-javadoc-API...
jsoup 中文帮助文档 1. 解析和遍历一个html文档输入 2. 解析一个html字符串 3. 解析一个body片断 4. 根据一个url加载Document对象 5. 根据一个文件加载Document对象数据抽取 6. 使用dom方法来遍历一个Document对象 7...
jsoup的主要功能如下: 1. 从一个URL,文件或字符串中解析HTML; 2. 使用DOM或CSS选择器来查找、取出数据; 3. 可操作HTML元素、属性、文本; jsoup是基于MIT协议发布的,可放心使用于商业项目。
本项目是一个基于安卓的简单利用jsoup爬取学校内网数据的教务系统app,设计的部分有:个人信息、课程表、考试时间、考勤信息、成绩查询、奖惩情况、开设课程、晚归违规等部分。主要思路就是利用jsoup爬数据把数据转化...
jsoup的主要功能如下: 1. 从一个URL,文件或字符串中解析HTML; 2. 使用DOM或CSS选择器来查找、取出数据; 3. 可操作HTML元素、属性、文本; jsoup是基于MIT协议发布的,可放心使用于商业项目。
IText、Jsoup jar包
使用jsoup技术连接网站地址,通过改变参数获取网站div模块下的所有信息。通过hibernate技术保存到mysql数据库。项目中包括dbutil链接数据库的工具类,执行sql的helper工具类,获取网站div图片保存到本地的工具类信息...
Jsoup 1.5.2 和jsoup 1.6 开发包jar包,开发文档,源码包
Jsoup 爬虫相关技术,可在java项目中引入jar包,然后使用网页链接进行资源爬取
Jsoup库文件;Jsoup解析Java包
JsoupAPI jsoup最新版帮助文档(1.10.2)
主要介绍了Java爬虫Jsoup+httpclient获取动态生成的数据的相关资料,需要的朋友可以参考下
通过使用jsoup库,获取网页的内容,解析修改网页,并且能够显示出修改的结果。