Zend_Dom-Query.xml 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!-- Reviewed: no -->
  3. <sect1 id="zend.dom.query">
  4. <title>Zend_Dom_Query</title>
  5. <para>
  6. <classname>Zend_Dom_Query</classname> provides mechanisms for querying
  7. <acronym>XML</acronym> and (X)<acronym>HTML</acronym> documents utilizing either XPath or
  8. <acronym>CSS</acronym> selectors. It was developed to aid with functional testing of
  9. <acronym>MVC</acronym> applications, but could also be used for rapid development of screen
  10. scrapers.
  11. </para>
  12. <para>
  13. <acronym>CSS</acronym> selector notation is provided as a simpler and more familiar
  14. notation for web developers to utilize when querying documents with <acronym>XML</acronym>
  15. structures. The notation should be familiar to anybody who has developed
  16. Cascading Style Sheets or who utilizes Javascript toolkits that provide
  17. functionality for selecting nodes utilizing <acronym>CSS</acronym> selectors
  18. (<ulink url="http://prototypejs.org/api/utility/dollar-dollar">Prototype's
  19. $$()</ulink> and
  20. <ulink url="http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query">Dojo's
  21. dojo.query</ulink> were both inspirations for the component).
  22. </para>
  23. <sect2 id="zend.dom.query.operation">
  24. <title>Theory of Operation</title>
  25. <para>
  26. To use <classname>Zend_Dom_Query</classname>, you instantiate a
  27. <classname>Zend_Dom_Query</classname> object, optionally passing a document to
  28. query (a string). Once you have a document, you can use either the
  29. <methodname>query()</methodname> or <methodname>queryXpath()</methodname> methods; each
  30. method will return a <classname>Zend_Dom_Query_Result</classname> object with
  31. any matching nodes.
  32. </para>
  33. <para>
  34. The primary difference between <classname>Zend_Dom_Query</classname> and using
  35. DOMDocument + DOMXPath is the ability to select against <acronym>CSS</acronym>
  36. selectors. You can utilize any of the following, in any combination:
  37. </para>
  38. <itemizedlist>
  39. <listitem>
  40. <para>
  41. <emphasis>element types</emphasis>: provide an element type to
  42. match: 'div', 'a', 'span', 'h2', etc.
  43. </para>
  44. </listitem>
  45. <listitem>
  46. <para>
  47. <emphasis>style attributes</emphasis>: <acronym>CSS</acronym> style attributes
  48. to match: '<command>.error</command>', '<command>div.error</command>',
  49. '<command>label.required</command>', etc. If an
  50. element defines more than one style, this will match as long as
  51. the named style is present anywhere in the style declaration.
  52. </para>
  53. </listitem>
  54. <listitem>
  55. <para>
  56. <emphasis>id attributes</emphasis>: element ID attributes to
  57. match: '#content', 'div#nav', etc.
  58. </para>
  59. </listitem>
  60. <listitem>
  61. <para>
  62. <emphasis>arbitrary attributes</emphasis>: arbitrary element
  63. attributes to match. Three different types of matching are
  64. provided:
  65. </para>
  66. <itemizedlist>
  67. <listitem>
  68. <para>
  69. <emphasis>exact match</emphasis>: the attribute exactly
  70. matches the string: 'div[bar="baz"]' would match a div
  71. element with a "bar" attribute that exactly matches the
  72. value "baz".
  73. </para>
  74. </listitem>
  75. <listitem>
  76. <para>
  77. <emphasis>word match</emphasis>: the attribute contains
  78. a word matching the string: 'div[bar~="baz"]' would match a div
  79. element with a "bar" attribute that contains the
  80. word "baz". '&lt;div bar="foo baz"&gt;' would match, but '&lt;div
  81. bar="foo bazbat"&gt;' would not.
  82. </para>
  83. </listitem>
  84. <listitem>
  85. <para>
  86. <emphasis>substring match</emphasis>: the attribute contains
  87. the string: 'div[bar*="baz"]' would match a div
  88. element with a "bar" attribute that contains the
  89. string "baz" anywhere within it.
  90. </para>
  91. </listitem>
  92. </itemizedlist>
  93. </listitem>
  94. <listitem>
  95. <para>
  96. <emphasis>direct descendents</emphasis>: utilize '&gt;' between
  97. selectors to denote direct descendents. 'div > span' would
  98. select only 'span' elements that are direct descendents of a
  99. 'div'. Can also be used with any of the selectors above.
  100. </para>
  101. </listitem>
  102. <listitem>
  103. <para>
  104. <emphasis>descendents</emphasis>: string together
  105. multiple selectors to indicate a hierarchy along which
  106. to search. '<command>div .foo span #one</command>' would select an element
  107. of id 'one' that is a descendent of arbitrary depth
  108. beneath a 'span' element, which is in turn a descendent
  109. of arbitrary depth beneath an element with a class of
  110. 'foo', that is an descendent of arbitrary depth beneath
  111. a 'div' element. For example, it would match the link to
  112. the word 'One' in the listing below:
  113. </para>
  114. <programlisting language="html"><![CDATA[
  115. <div>
  116. <table>
  117. <tr>
  118. <td class="foo">
  119. <div>
  120. Lorem ipsum <span class="bar">
  121. <a href="/foo/bar" id="one">One</a>
  122. <a href="/foo/baz" id="two">Two</a>
  123. <a href="/foo/bat" id="three">Three</a>
  124. <a href="/foo/bla" id="four">Four</a>
  125. </span>
  126. </div>
  127. </td>
  128. </tr>
  129. </table>
  130. </div>
  131. ]]></programlisting>
  132. </listitem>
  133. </itemizedlist>
  134. <para>
  135. Once you've performed your query, you can then work with the result
  136. object to determine information about the nodes, as well as to pull
  137. them and/or their content directly for examination and manipulation.
  138. <classname>Zend_Dom_Query_Result</classname> implements <classname>Countable</classname>
  139. and <classname>Iterator</classname>, and store the results internally as
  140. DOMNodes and DOMElements. As an example, consider the following call,
  141. that selects against the <acronym>HTML</acronym> above:
  142. </para>
  143. <programlisting language="php"><![CDATA[
  144. $dom = new Zend_Dom_Query($html);
  145. $results = $dom->query('.foo .bar a');
  146. $count = count($results); // get number of matches: 4
  147. foreach ($results as $result) {
  148. // $result is a DOMElement
  149. }
  150. ]]></programlisting>
  151. <para>
  152. <classname>Zend_Dom_Query</classname> also allows straight XPath queries
  153. utilizing the <methodname>queryXpath()</methodname> method; you can pass any
  154. valid XPath query to this method, and it will return a
  155. <classname>Zend_Dom_Query_Result</classname> object.
  156. </para>
  157. </sect2>
  158. <sect2 id="zend.dom.query.methods">
  159. <title>Methods Available</title>
  160. <para>
  161. The <classname>Zend_Dom_Query</classname> family of classes have the following
  162. methods available.
  163. </para>
  164. <sect3 id="zend.dom.query.methods.zenddomquery">
  165. <title>Zend_Dom_Query</title>
  166. <para>
  167. The following methods are available to
  168. <classname>Zend_Dom_Query</classname>:
  169. </para>
  170. <itemizedlist>
  171. <listitem>
  172. <para>
  173. <methodname>setDocumentXml($document)</methodname>: specify an
  174. <acronym>XML</acronym> string to query against.
  175. </para>
  176. </listitem>
  177. <listitem>
  178. <para>
  179. <methodname>setDocumentXhtml($document)</methodname>: specify an
  180. <acronym>XHTML</acronym> string to query against.
  181. </para>
  182. </listitem>
  183. <listitem>
  184. <para>
  185. <methodname>setDocumentHtml($document)</methodname>: specify an
  186. <acronym>HTML</acronym> string to query against.
  187. </para>
  188. </listitem>
  189. <listitem>
  190. <para>
  191. <methodname>setDocument($document)</methodname>: specify a
  192. string to query against; <classname>Zend_Dom_Query</classname> will
  193. then attempt to autodetect the document type.
  194. </para>
  195. </listitem>
  196. <listitem>
  197. <para>
  198. <methodname>getDocument()</methodname>: retrieve the original document
  199. string provided to the object.
  200. </para>
  201. </listitem>
  202. <listitem>
  203. <para>
  204. <methodname>getDocumentType()</methodname>: retrieve the document
  205. type of the document provided to the object; will be one of
  206. the <constant>DOC_XML</constant>, <constant>DOC_XHTML</constant>, or
  207. <constant>DOC_HTML</constant> class constants.
  208. </para>
  209. </listitem>
  210. <listitem>
  211. <para>
  212. <methodname>query($query)</methodname>: query the document using
  213. <acronym>CSS</acronym> selector notation.
  214. </para>
  215. </listitem>
  216. <listitem>
  217. <para>
  218. <methodname>queryXpath($xPathQuery)</methodname>: query the document
  219. using XPath notation.
  220. </para>
  221. </listitem>
  222. </itemizedlist>
  223. </sect3>
  224. <sect3 id="zend.dom.query.methods.zenddomqueryresult">
  225. <title>Zend_Dom_Query_Result</title>
  226. <para>
  227. As mentioned previously, <classname>Zend_Dom_Query_Result</classname>
  228. implements both <classname>Iterator</classname> and
  229. <classname>Countable</classname>, and as such can be used in a
  230. <methodname>foreach()</methodname> loop as well as with the
  231. <methodname>count()</methodname> function. Additionally, it exposes the
  232. following methods:
  233. </para>
  234. <itemizedlist>
  235. <listitem>
  236. <para>
  237. <methodname>getCssQuery()</methodname>: return the <acronym>CSS</acronym>
  238. selector query used to produce the result (if any).
  239. </para>
  240. </listitem>
  241. <listitem>
  242. <para>
  243. <methodname>getXpathQuery()</methodname>: return the XPath query
  244. used to produce the result. Internally,
  245. <classname>Zend_Dom_Query</classname> converts <acronym>CSS</acronym>
  246. selector queries to XPath, so this value will always be populated.
  247. </para>
  248. </listitem>
  249. <listitem>
  250. <para>
  251. <methodname>getDocument()</methodname>: retrieve the DOMDocument the
  252. selection was made against.
  253. </para>
  254. </listitem>
  255. </itemizedlist>
  256. </sect3>
  257. </sect2>
  258. </sect1>
  259. <!--
  260. vim:se ts=4 sw=4 et:
  261. -->