Zend_Dom-Query.xml 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!-- Reviewed: no -->
  3. <sect1 id="zend.dom.query">
  4. <title>Zend_Dom_Query</title>
  5. <para>
  6. <classname>Zend_Dom_Query</classname> provides mechanisms for querying XML and
  7. (X)HTML documents utilizing either XPath or CSS selectors. It was
  8. developed to aid with functional testing of MVC applications, but could
  9. also be used for rapid development of screen scrapers.
  10. </para>
  11. <para>
  12. CSS selector notation is provided as a simpler and more familiar
  13. notation for web developers to utilize when querying documents with XML
  14. structures. The notation should be familiar to anybody who has developed
  15. Cascading Style Sheets or who utilizes Javascript toolkits that provide
  16. functionality for selecting nodes utilizing CSS selectors
  17. (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
  18. $$()</ulink> and
  19. <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
  20. dojo.query</ulink> were both inspirations for the component).
  21. </para>
  22. <sect2 id="zend.dom.query.operation">
  23. <title>Theory of Operation</title>
  24. <para>
  25. To use <classname>Zend_Dom_Query</classname>, you instantiate a
  26. <classname>Zend_Dom_Query</classname> object, optionally passing a document to
  27. query (a string). Once you have a document, you can use either the
  28. <code>query()</code> or <code>queryXpath()</code> methods; each
  29. method will return a <classname>Zend_Dom_Query_Result</classname> object with
  30. any matching nodes.
  31. </para>
  32. <para>
  33. The primary difference between <classname>Zend_Dom_Query</classname> and using
  34. DOMDocument + DOMXPath is the ability to select against CSS
  35. selectors. You can utilize any of the following, in any combination:
  36. </para>
  37. <itemizedlist>
  38. <listitem><para>
  39. <emphasis>element types</emphasis>: provide an element type to
  40. match: 'div', 'a', 'span', 'h2', etc.
  41. </para></listitem>
  42. <listitem><para>
  43. <emphasis>style attributes</emphasis>: CSS style attributes to
  44. match: '.error', 'div.error', 'label.required', etc. If an
  45. element defines more than one style, this will match as long as
  46. the named style is present anywhere in the style declaration.
  47. </para></listitem>
  48. <listitem><para>
  49. <emphasis>id attributes</emphasis>: element ID attributes to
  50. match: '#content', 'div#nav', etc.
  51. </para></listitem>
  52. <listitem>
  53. <para>
  54. <emphasis>arbitrary attributes</emphasis>: arbitrary element
  55. attributes to match. Three different types of matching are
  56. provided:
  57. </para>
  58. <itemizedlist>
  59. <listitem><para>
  60. <emphasis>exact match</emphasis>: the attribute exactly
  61. matches the string: 'div[bar="baz"]' would match a div
  62. element with a "bar" attribute that exactly matches the
  63. value "baz".
  64. </para></listitem>
  65. <listitem><para>
  66. <emphasis>word match</emphasis>: the attribute contains
  67. a word matching the string: 'div[bar~="baz"]' would match a div
  68. element with a "bar" attribute that contains the
  69. word "baz". '&lt;div bar="foo baz"&gt;' would match, but '&lt;div
  70. bar="foo bazbat"&gt;' would not.
  71. </para></listitem>
  72. <listitem><para>
  73. <emphasis>substring match</emphasis>: the attribute contains
  74. the string: 'div[bar*="baz"]' would match a div
  75. element with a "bar" attribute that contains the
  76. string "baz" anywhere within it.
  77. </para></listitem>
  78. </itemizedlist>
  79. </listitem>
  80. <listitem><para>
  81. <emphasis>direct descendents</emphasis>: utilize '&gt;' between
  82. selectors to denote direct descendents. 'div > span' would
  83. select only 'span' elements that are direct descendents of a
  84. 'div'. Can also be used with any of the selectors above.
  85. </para></listitem>
  86. <listitem>
  87. <para>
  88. <emphasis>descendents</emphasis>: string together
  89. multiple selectors to indicate a hierarchy along which
  90. to search. 'div .foo span #one' would select an element
  91. of id 'one' that is a descendent of arbitrary depth
  92. beneath a 'span' element, which is in turn a descendent
  93. of arbitrary depth beneath an element with a class of
  94. 'foo', that is an descendent of arbitrary depth beneath
  95. a 'div' element. For example, it would match the link to
  96. the word 'One' in the listing below:
  97. </para>
  98. <programlisting language="html"><![CDATA[
  99. <div>
  100. <table>
  101. <tr>
  102. <td class="foo">
  103. <div>
  104. Lorem ipsum <span class="bar">
  105. <a href="/foo/bar" id="one">One</a>
  106. <a href="/foo/baz" id="two">Two</a>
  107. <a href="/foo/bat" id="three">Three</a>
  108. <a href="/foo/bla" id="four">Four</a>
  109. </span>
  110. </div>
  111. </td>
  112. </tr>
  113. </table>
  114. </div>
  115. ]]></programlisting>
  116. </listitem>
  117. </itemizedlist>
  118. <para>
  119. Once you've performed your query, you can then work with the result
  120. object to determine information about the nodes, as well as to pull
  121. them and/or their content directly for examination and manipulation.
  122. <classname>Zend_Dom_Query_Result</classname> implements <code>Countable</code>
  123. and <code>Iterator</code>, and store the results internally as
  124. DOMNodes/DOMElements. As an example, consider the following call,
  125. that selects against the HTML above:
  126. </para>
  127. <programlisting language="php"><![CDATA[
  128. $dom = new Zend_Dom_Query($html);
  129. $results = $dom->query('.foo .bar a');
  130. $count = count($results); // get number of matches: 4
  131. foreach ($results as $result) {
  132. // $result is a DOMElement
  133. }
  134. ]]></programlisting>
  135. <para>
  136. <classname>Zend_Dom_Query</classname> also allows straight XPath queries
  137. utilizing the <code>queryXpath()</code> method; you can pass any
  138. valid XPath query to this method, and it will return a
  139. <classname>Zend_Dom_Query_Result</classname> object.
  140. </para>
  141. </sect2>
  142. <sect2 id="zend.dom.query.methods">
  143. <title>Methods Available</title>
  144. <para>
  145. The <classname>Zend_Dom_Query</classname> family of classes have the following
  146. methods available.
  147. </para>
  148. <sect3 id="zend.dom.query.methods.zenddomquery">
  149. <title>Zend_Dom_Query</title>
  150. <para>
  151. The following methods are available to
  152. <classname>Zend_Dom_Query</classname>:
  153. </para>
  154. <itemizedlist>
  155. <listitem><para>
  156. <code>setDocumentXml($document)</code>: specify an XML
  157. string to query against.
  158. </para></listitem>
  159. <listitem><para>
  160. <code>setDocumentXhtml($document)</code>: specify an XHTML
  161. string to query against.
  162. </para></listitem>
  163. <listitem><para>
  164. <code>setDocumentHtml($document)</code>: specify an HTML
  165. string to query against.
  166. </para></listitem>
  167. <listitem><para>
  168. <code>setDocument($document)</code>: specify a
  169. string to query against; <classname>Zend_Dom_Query</classname> will
  170. then attempt to autodetect the document type.
  171. </para></listitem>
  172. <listitem><para>
  173. <code>getDocument()</code>: retrieve the original document
  174. string provided to the object.
  175. </para></listitem>
  176. <listitem><para>
  177. <code>getDocumentType()</code>: retrieve the document
  178. type of the document provided to the object; will be one of
  179. the <code>DOC_XML</code>, <code>DOC_XHTML</code>, or
  180. <code>DOC_HTML</code> class constants.
  181. </para></listitem>
  182. <listitem><para>
  183. <code>query($query)</code>: query the document using CSS
  184. selector notation.
  185. </para></listitem>
  186. <listitem><para>
  187. <code>queryXpath($xPathQuery)</code>: query the document
  188. using XPath notation.
  189. </para></listitem>
  190. </itemizedlist>
  191. </sect3>
  192. <sect3 id="zend.dom.query.methods.zenddomqueryresult">
  193. <title>Zend_Dom_Query_Result</title>
  194. <para>
  195. As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
  196. implements both <code>Iterator</code> and
  197. <code>Countable</code>, and as such can be used in a
  198. <code>foreach</code> loop as well as with the
  199. <code>count()</code> function. Additionally, it exposes the
  200. following methods:
  201. </para>
  202. <itemizedlist>
  203. <listitem><para>
  204. <code>getCssQuery()</code>: return the CSS selector query
  205. used to produce the result (if any).
  206. </para></listitem>
  207. <listitem><para>
  208. <code>getXpathQuery()</code>: return the XPath query
  209. used to produce the result. Internally,
  210. <classname>Zend_Dom_Query</classname> converts CSS selector queries to
  211. XPath, so this value will always be populated.
  212. </para></listitem>
  213. <listitem><para>
  214. <code>getDocument()</code>: retrieve the DOMDocument the
  215. selection was made against.
  216. </para></listitem>
  217. </itemizedlist>
  218. </sect3>
  219. </sect2>
  220. </sect1>
  221. <!--
  222. vim:se ts=4 sw=4 et:
  223. -->