Zend_Dom-Query.xml 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!-- Reviewed: no -->
  3. <sect1 id="zend.dom.query">
  4. <title>Zend_Dom_Query</title>
  5. <para>
  6. <classname>Zend_Dom_Query</classname> provides mechanisms for querying <acronym>XML</acronym> and
  7. (X)HTML documents utilizing either XPath or <acronym>CSS</acronym> selectors. It was
  8. developed to aid with functional testing of <acronym>MVC</acronym> applications, but could
  9. also be used for rapid development of screen scrapers.
  10. </para>
  11. <para>
  12. <acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
  13. notation for web developers to utilize when querying documents with <acronym>XML</acronym>
  14. structures. The notation should be familiar to anybody who has developed
  15. Cascading Style Sheets or who utilizes Javascript toolkits that provide
  16. functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
  17. (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
  18. $$()</ulink> and
  19. <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
  20. dojo.query</ulink> were both inspirations for the component).
  21. </para>
  22. <sect2 id="zend.dom.query.operation">
  23. <title>Theory of Operation</title>
  24. <para>
  25. To use <classname>Zend_Dom_Query</classname>, you instantiate a
  26. <classname>Zend_Dom_Query</classname> object, optionally passing a document to
  27. query (a string). Once you have a document, you can use either the
  28. <methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
  29. method will return a <classname>Zend_Dom_Query_Result</classname> object with
  30. any matching nodes.
  31. </para>
  32. <para>
  33. The primary difference between <classname>Zend_Dom_Query</classname> and using
  34. DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
  35. selectors. You can utilize any of the following, in any combination:
  36. </para>
  37. <itemizedlist>
  38. <listitem><para>
  39. <emphasis>element types</emphasis>: provide an element type to
  40. match: 'div', 'a', 'span', 'h2', etc.
  41. </para></listitem>
  42. <listitem><para>
  43. <emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes to
  44. match: '.error', 'div.error', 'label.required', etc. If an
  45. element defines more than one style, this will match as long as
  46. the named style is present anywhere in the style declaration.
  47. </para></listitem>
  48. <listitem><para>
  49. <emphasis>id attributes</emphasis>: element ID attributes to
  50. match: '#content', 'div#nav', etc.
  51. </para></listitem>
  52. <listitem>
  53. <para>
  54. <emphasis>arbitrary attributes</emphasis>: arbitrary element
  55. attributes to match. Three different types of matching are
  56. provided:
  57. </para>
  58. <itemizedlist>
  59. <listitem><para>
  60. <emphasis>exact match</emphasis>: the attribute exactly
  61. matches the string: 'div[bar="baz"]' would match a div
  62. element with a "bar" attribute that exactly matches the
  63. value "baz".
  64. </para></listitem>
  65. <listitem><para>
  66. <emphasis>word match</emphasis>: the attribute contains
  67. a word matching the string: 'div[bar~="baz"]' would match a div
  68. element with a "bar" attribute that contains the
  69. word "baz". '&lt;div bar="foo baz"&gt;' would match, but '&lt;div
  70. bar="foo bazbat"&gt;' would not.
  71. </para></listitem>
  72. <listitem><para>
  73. <emphasis>substring match</emphasis>: the attribute contains
  74. the string: 'div[bar*="baz"]' would match a div
  75. element with a "bar" attribute that contains the
  76. string "baz" anywhere within it.
  77. </para></listitem>
  78. </itemizedlist>
  79. </listitem>
  80. <listitem><para>
  81. <emphasis>direct descendents</emphasis>: utilize '&gt;' between
  82. selectors to denote direct descendents. 'div > span' would
  83. select only 'span' elements that are direct descendents of a
  84. 'div'. Can also be used with any of the selectors above.
  85. </para></listitem>
  86. <listitem>
  87. <para>
  88. <emphasis>descendents</emphasis>: string together
  89. multiple selectors to indicate a hierarchy along which
  90. to search. 'div .foo span #one' would select an element
  91. of id 'one' that is a descendent of arbitrary depth
  92. beneath a 'span' element, which is in turn a descendent
  93. of arbitrary depth beneath an element with a class of
  94. 'foo', that is an descendent of arbitrary depth beneath
  95. a 'div' element. For example, it would match the link to
  96. the word 'One' in the listing below:
  97. </para>
  98. <programlisting language="html"><![CDATA[
  99. <div>
  100. <table>
  101. <tr>
  102. <td class="foo">
  103. <div>
  104. Lorem ipsum <span class="bar">
  105. <a href="/foo/bar" id="one">One</a>
  106. <a href="/foo/baz" id="two">Two</a>
  107. <a href="/foo/bat" id="three">Three</a>
  108. <a href="/foo/bla" id="four">Four</a>
  109. </span>
  110. </div>
  111. </td>
  112. </tr>
  113. </table>
  114. </div>
  115. ]]></programlisting>
  116. </listitem>
  117. </itemizedlist>
  118. <para>
  119. Once you've performed your query, you can then work with the result
  120. object to determine information about the nodes, as well as to pull
  121. them and/or their content directly for examination and manipulation.
  122. <classname>Zend_Dom_Query_Result</classname> implements <code>Countable</code>
  123. and <code>Iterator</code>, and store the results internally as
  124. DOMNodes/DOMElements. As an example, consider the following call,
  125. that selects against the HTML above:
  126. </para>
  127. <programlisting language="php"><![CDATA[
  128. $dom = new Zend_Dom_Query($html);
  129. $results = $dom->query('.foo .bar a');
  130. $count = count($results); // get number of matches: 4
  131. foreach ($results as $result) {
  132. // $result is a DOMElement
  133. }
  134. ]]></programlisting>
  135. <para>
  136. <classname>Zend_Dom_Query</classname> also allows straight XPath queries
  137. utilizing the <methodname>queryXpath()</methodname> method; you can pass any
  138. valid XPath query to this method, and it will return a
  139. <classname>Zend_Dom_Query_Result</classname> object.
  140. </para>
  141. </sect2>
  142. <sect2 id="zend.dom.query.methods">
  143. <title>Methods Available</title>
  144. <para>
  145. The <classname>Zend_Dom_Query</classname> family of classes have the following
  146. methods available.
  147. </para>
  148. <sect3 id="zend.dom.query.methods.zenddomquery">
  149. <title>Zend_Dom_Query</title>
  150. <para>
  151. The following methods are available to
  152. <classname>Zend_Dom_Query</classname>:
  153. </para>
  154. <itemizedlist>
  155. <listitem><para>
  156. <methodname>setDocumentXml($document)</methodname>: specify an <acronym>XML</acronym>
  157. string to query against.
  158. </para></listitem>
  159. <listitem><para>
  160. <methodname>setDocumentXhtml($document)</methodname>: specify an <acronym>XHTML</acronym>
  161. string to query against.
  162. </para></listitem>
  163. <listitem><para>
  164. <methodname>setDocumentHtml($document)</methodname>: specify an HTML
  165. string to query against.
  166. </para></listitem>
  167. <listitem><para>
  168. <methodname>setDocument($document)</methodname>: specify a
  169. string to query against; <classname>Zend_Dom_Query</classname> will
  170. then attempt to autodetect the document type.
  171. </para></listitem>
  172. <listitem><para>
  173. <methodname>getDocument()</methodname>: retrieve the original document
  174. string provided to the object.
  175. </para></listitem>
  176. <listitem><para>
  177. <methodname>getDocumentType()</methodname>: retrieve the document
  178. type of the document provided to the object; will be one of
  179. the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
  180. <constant>DOC_HTML</constant> class constants.
  181. </para></listitem>
  182. <listitem><para>
  183. <methodname>query($query)</methodname>: query the document using <acronym>CSS</acronym>
  184. selector notation.
  185. </para></listitem>
  186. <listitem><para>
  187. <methodname>queryXpath($xPathQuery)</methodname>: query the document
  188. using XPath notation.
  189. </para></listitem>
  190. </itemizedlist>
  191. </sect3>
  192. <sect3 id="zend.dom.query.methods.zenddomqueryresult">
  193. <title>Zend_Dom_Query_Result</title>
  194. <para>
  195. As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
  196. implements both <code>Iterator</code> and
  197. <code>Countable</code>, and as such can be used in a
  198. <code>foreach</code> loop as well as with the
  199. <methodname>count()</methodname> function. Additionally, it exposes the
  200. following methods:
  201. </para>
  202. <itemizedlist>
  203. <listitem><para>
  204. <methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym> selector query
  205. used to produce the result (if any).
  206. </para></listitem>
  207. <listitem><para>
  208. <methodname>getXpathQuery()</methodname>: return the XPath query
  209. used to produce the result. Internally,
  210. <classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym> selector queries to
  211. XPath, so this value will always be populated.
  212. </para></listitem>
  213. <listitem><para>
  214. <methodname>getDocument()</methodname>: retrieve the DOMDocument the
  215. selection was made against.
  216. </para></listitem>
  217. </itemizedlist>
  218. </sect3>
  219. </sect2>
  220. </sect1>
  221. <!--
  222. vim:se ts=4 sw=4 et:
  223. -->