| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297 |
- <?xml version="1.0" encoding="UTF-8"?>
- <!-- Reviewed: no -->
- <sect1 id="zend.dom.query">
- <title>Zend_Dom_Query</title>
- <para>
- <classname>Zend_Dom_Query</classname> provides mechanisms for querying
- <acronym>XML</acronym> and (X)<acronym>HTML</acronym> documents utilizing either XPath or
- <acronym>CSS</acronym> selectors. It was developed to aid with functional testing of
- <acronym>MVC</acronym> applications, but could also be used for rapid development of screen
- scrapers.
- </para>
- <para>
- <acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
- notation for web developers to utilize when querying documents with <acronym>XML</acronym>
- structures. The notation should be familiar to anybody who has developed
- Cascading Style Sheets or who utilizes Javascript toolkits that provide
- functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
- (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
- $$()</ulink> and
- <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
- dojo.query</ulink> were both inspirations for the component).
- </para>
- <sect2 id="zend.dom.query.operation">
- <title>Theory of Operation</title>
- <para>
- To use <classname>Zend_Dom_Query</classname>, you instantiate a
- <classname>Zend_Dom_Query</classname> object, optionally passing a document to
- query (a string). Once you have a document, you can use either the
- <methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
- method will return a <classname>Zend_Dom_Query_Result</classname> object with
- any matching nodes.
- </para>
- <para>
- The primary difference between <classname>Zend_Dom_Query</classname> and using
- DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
- selectors. You can utilize any of the following, in any combination:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- <emphasis>element types</emphasis>: provide an element type to
- match: 'div', 'a', 'span', 'h2', etc.
- </para>
- </listitem>
- <listitem>
- <para>
- <emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes
- to match: '<command>.error</command>', '<command>div.error</command>',
- '<command>label.required</command>', etc. If an
- element defines more than one style, this will match as long as
- the named style is present anywhere in the style declaration.
- </para>
- </listitem>
- <listitem>
- <para>
- <emphasis>id attributes</emphasis>: element ID attributes to
- match: '#content', 'div#nav', etc.
- </para>
- </listitem>
- <listitem>
- <para>
- <emphasis>arbitrary attributes</emphasis>: arbitrary element
- attributes to match. Three different types of matching are
- provided:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- <emphasis>exact match</emphasis>: the attribute exactly
- matches the string: 'div[bar="baz"]' would match a div
- element with a "bar" attribute that exactly matches the
- value "baz".
- </para>
- </listitem>
- <listitem>
- <para>
- <emphasis>word match</emphasis>: the attribute contains
- a word matching the string: 'div[bar~="baz"]' would match a div
- element with a "bar" attribute that contains the
- word "baz". '<div bar="foo baz">' would match, but '<div
- bar="foo bazbat">' would not.
- </para>
- </listitem>
- <listitem>
- <para>
- <emphasis>substring match</emphasis>: the attribute contains
- the string: 'div[bar*="baz"]' would match a div
- element with a "bar" attribute that contains the
- string "baz" anywhere within it.
- </para>
- </listitem>
- </itemizedlist>
- </listitem>
- <listitem>
- <para>
- <emphasis>direct descendents</emphasis>: utilize '>' between
- selectors to denote direct descendents. 'div > span' would
- select only 'span' elements that are direct descendents of a
- 'div'. Can also be used with any of the selectors above.
- </para>
- </listitem>
- <listitem>
- <para>
- <emphasis>descendents</emphasis>: string together
- multiple selectors to indicate a hierarchy along which
- to search. '<command>div .foo span #one</command>' would select an element
- of id 'one' that is a descendent of arbitrary depth
- beneath a 'span' element, which is in turn a descendent
- of arbitrary depth beneath an element with a class of
- 'foo', that is an descendent of arbitrary depth beneath
- a 'div' element. For example, it would match the link to
- the word 'One' in the listing below:
- </para>
- <programlisting language="html"><![CDATA[
- <div>
- <table>
- <tr>
- <td class="foo">
- <div>
- Lorem ipsum <span class="bar">
- <a href="/foo/bar" id="one">One</a>
- <a href="/foo/baz" id="two">Two</a>
- <a href="/foo/bat" id="three">Three</a>
- <a href="/foo/bla" id="four">Four</a>
- </span>
- </div>
- </td>
- </tr>
- </table>
- </div>
- ]]></programlisting>
- </listitem>
- </itemizedlist>
- <para>
- Once you've performed your query, you can then work with the result
- object to determine information about the nodes, as well as to pull
- them and/or their content directly for examination and manipulation.
- <classname>Zend_Dom_Query_Result</classname> implements <classname>Countable</classname>
- and <classname>Iterator</classname>, and store the results internally as
- DOMNodes and DOMElements. As an example, consider the following call,
- that selects against the <acronym>HTML</acronym> above:
- </para>
- <programlisting language="php"><![CDATA[
- $dom = new Zend_Dom_Query($html);
- $results = $dom->query('.foo .bar a');
- $count = count($results); // get number of matches: 4
- foreach ($results as $result) {
- // $result is a DOMElement
- }
- ]]></programlisting>
- <para>
- <classname>Zend_Dom_Query</classname> also allows straight XPath queries
- utilizing the <methodname>queryXpath()</methodname> method; you can pass any
- valid XPath query to this method, and it will return a
- <classname>Zend_Dom_Query_Result</classname> object.
- </para>
- </sect2>
- <sect2 id="zend.dom.query.methods">
- <title>Methods Available</title>
- <para>
- The <classname>Zend_Dom_Query</classname> family of classes have the following
- methods available.
- </para>
- <sect3 id="zend.dom.query.methods.zenddomquery">
- <title>Zend_Dom_Query</title>
- <para>
- The following methods are available to
- <classname>Zend_Dom_Query</classname>:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- <methodname>setDocumentXml($document)</methodname>: specify an
- <acronym>XML</acronym> string to query against.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>setDocumentXhtml($document)</methodname>: specify an
- <acronym>XHTML</acronym> string to query against.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>setDocumentHtml($document)</methodname>: specify an
- <acronym>HTML</acronym> string to query against.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>setDocument($document)</methodname>: specify a
- string to query against; <classname>Zend_Dom_Query</classname> will
- then attempt to autodetect the document type.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>getDocument()</methodname>: retrieve the original document
- string provided to the object.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>getDocumentType()</methodname>: retrieve the document
- type of the document provided to the object; will be one of
- the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
- <constant>DOC_HTML</constant> class constants.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>query($query)</methodname>: query the document using
- <acronym>CSS</acronym> selector notation.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>queryXpath($xPathQuery)</methodname>: query the document
- using XPath notation.
- </para>
- </listitem>
- </itemizedlist>
- </sect3>
- <sect3 id="zend.dom.query.methods.zenddomqueryresult">
- <title>Zend_Dom_Query_Result</title>
- <para>
- As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
- implements both <classname>Iterator</classname> and
- <classname>Countable</classname>, and as such can be used in a
- <methodname>foreach()</methodname> loop as well as with the
- <methodname>count()</methodname> function. Additionally, it exposes the
- following methods:
- </para>
- <itemizedlist>
- <listitem>
- <para>
- <methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym>
- selector query used to produce the result (if any).
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>getXpathQuery()</methodname>: return the XPath query
- used to produce the result. Internally,
- <classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym>
- selector queries to XPath, so this value will always be populated.
- </para>
- </listitem>
- <listitem>
- <para>
- <methodname>getDocument()</methodname>: retrieve the DOMDocument the
- selection was made against.
- </para>
- </listitem>
- </itemizedlist>
- </sect3>
- </sect2>
- </sect1>
- <!--
- vim:se ts=4 sw=4 et:
- -->
|