Zend_Search_Lucene-Overview.xml 26 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!-- Reviewed: no -->
  3. <sect1 id="zend.search.lucene.overview">
  4. <title>Overview</title>
  5. <sect2 id="zend.search.lucene.introduction">
  6. <title>Introduction</title>
  7. <para>
  8. <classname>Zend_Search_Lucene</classname> is a general purpose text search engine
  9. written entirely in <acronym>PHP</acronym> 5. Since it stores its index on the
  10. filesystem and does not require a database server, it can add search capabilities to
  11. almost any <acronym>PHP</acronym>-driven website.
  12. <classname>Zend_Search_Lucene</classname> supports the following features:
  13. <itemizedlist>
  14. <listitem>
  15. <para>Ranked searching - best results returned first</para>
  16. </listitem>
  17. <listitem>
  18. <para>
  19. Many powerful query types: phrase queries, boolean queries, wildcard queries,
  20. proximity queries, range queries and many others.
  21. </para>
  22. </listitem>
  23. <listitem>
  24. <para>Search by specific field (e.g., title, author, contents)</para>
  25. </listitem>
  26. </itemizedlist>
  27. <classname>Zend_Search_Lucene</classname> was derived from the Apache Lucene project.
  28. The currently (starting from ZF 1.6) supported Lucene index format versions are 1.4 -
  29. 2.3. For more information on Lucene, visit <ulink
  30. url="http://lucene.apache.org/java/docs/"/>.
  31. </para>
  32. <note>
  33. <title/>
  34. <para>
  35. Previous <classname>Zend_Search_Lucene</classname> implementations support the
  36. Lucene 1.4 (1.9) - 2.1 index formats.
  37. </para>
  38. <para>
  39. Starting from Zend Framework 1.5 any index created using pre-2.1 index format is
  40. automatically upgraded to Lucene 2.1 format after the
  41. <classname>Zend_Search_Lucene</classname> update and will not be compatible with
  42. <classname>Zend_Search_Lucene</classname> implementations included into Zend
  43. Framework 1.0.x.
  44. </para>
  45. </note>
  46. </sect2>
  47. <sect2 id="zend.search.lucene.index-creation.documents-and-fields">
  48. <title>Document and Field Objects</title>
  49. <para>
  50. <classname>Zend_Search_Lucene</classname> operates with documents as atomic objects for
  51. indexing. A document is divided into named fields, and fields have content that can be
  52. searched.
  53. </para>
  54. <para>
  55. A document is represented by the <classname>Zend_Search_Lucene_Document</classname>
  56. class, and this objects of this class contain instances of
  57. <classname>Zend_Search_Lucene_Field</classname> that represent the fields on the
  58. document.
  59. </para>
  60. <para>
  61. It is important to note that any information can be added to the index.
  62. Application-specific information or metadata can be stored in the document
  63. fields, and later retrieved with the document during search.
  64. </para>
  65. <para>
  66. It is the responsibility of your application to control the indexer.
  67. This means that data can be indexed from any source
  68. that is accessible by your application. For example, this could be the
  69. filesystem, a database, an <acronym>HTML</acronym> form, etc.
  70. </para>
  71. <para>
  72. <classname>Zend_Search_Lucene_Field</classname> class provides several static methods to
  73. create fields with different characteristics:
  74. </para>
  75. <programlisting language="php"><![CDATA[
  76. $doc = new Zend_Search_Lucene_Document();
  77. // Field is not tokenized, but is indexed and stored within the index.
  78. // Stored fields can be retrived from the index.
  79. $doc->addField(Zend_Search_Lucene_Field::Keyword('doctype',
  80. 'autogenerated'));
  81. // Field is not tokenized nor indexed, but is stored in the index.
  82. $doc->addField(Zend_Search_Lucene_Field::UnIndexed('created',
  83. time()));
  84. // Binary String valued Field that is not tokenized nor indexed,
  85. // but is stored in the index.
  86. $doc->addField(Zend_Search_Lucene_Field::Binary('icon',
  87. $iconData));
  88. // Field is tokenized and indexed, and is stored in the index.
  89. $doc->addField(Zend_Search_Lucene_Field::Text('annotation',
  90. 'Document annotation text'));
  91. // Field is tokenized and indexed, but is not stored in the index.
  92. $doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
  93. 'My document content'));
  94. ]]></programlisting>
  95. <para>
  96. Each of these methods (excluding the
  97. <methodname>Zend_Search_Lucene_Field::Binary()</methodname> method) has an optional
  98. <varname>$encoding</varname> parameter for specifying input data encoding.
  99. </para>
  100. <para>
  101. Encoding may differ for different documents as well as for different fields within one
  102. document:
  103. </para>
  104. <programlisting language="php"><![CDATA[
  105. $doc = new Zend_Search_Lucene_Document();
  106. $doc->addField(Zend_Search_Lucene_Field::Text('title',
  107. $title,
  108. 'iso-8859-1'));
  109. $doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
  110. $contents,
  111. 'utf-8'));
  112. ]]></programlisting>
  113. <para>
  114. If encoding parameter is omitted, then the current locale is used at processing time.
  115. For example:
  116. </para>
  117. <programlisting language="php"><![CDATA[
  118. setlocale(LC_ALL, 'de_DE.iso-8859-1');
  119. ...
  120. $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $contents));
  121. ]]></programlisting>
  122. <para>
  123. Fields are always stored and returned from the index in UTF-8 encoding. Any required
  124. conversion to UTF-8 happens automatically.
  125. </para>
  126. <para>
  127. Text analyzers (<link linkend="zend.search.lucene.extending.analysis">see below</link>)
  128. may also convert text to some other encodings. Actually, the default analyzer converts
  129. text to 'ASCII//TRANSLIT' encoding. Be careful, however; this translation may depend on
  130. current locale.
  131. </para>
  132. <para>
  133. Fields' names are defined at your discretion in the <methodname>addField()</methodname>
  134. method.
  135. </para>
  136. <para>
  137. Java Lucene uses the 'contents' field as a default field to search.
  138. <classname>Zend_Search_Lucene</classname> searches through all fields by default, but
  139. the behavior is configurable. See the <link
  140. linkend="zend.search.lucene.query-language.fields">"Default search field"</link>
  141. chapter for details.
  142. </para>
  143. </sect2>
  144. <sect2 id="zend.search.lucene.index-creation.understanding-field-types">
  145. <title>Understanding Field Types</title>
  146. <itemizedlist>
  147. <listitem>
  148. <para>
  149. <code>Keyword</code> fields are stored and indexed, meaning that they can be
  150. searched as well as displayed in search results. They are not split up into
  151. separate words by tokenization. Enumerated database fields usually translate
  152. well to Keyword fields in <classname>Zend_Search_Lucene</classname>.
  153. </para>
  154. </listitem>
  155. <listitem>
  156. <para>
  157. <code>UnIndexed</code> fields are not searchable, but they are returned with
  158. search hits. Database timestamps, primary keys, file system paths, and other
  159. external identifiers are good candidates for UnIndexed fields.
  160. </para>
  161. </listitem>
  162. <listitem>
  163. <para>
  164. <code>Binary</code> fields are not tokenized or indexed, but are stored for
  165. retrieval with search hits. They can be used to store any data encoded as a
  166. binary string, such as an image icon.
  167. </para>
  168. </listitem>
  169. <listitem>
  170. <para>
  171. <code>Text</code> fields are stored, indexed, and tokenized. Text fields are
  172. appropriate for storing information like subjects and titles that need to be
  173. searchable as well as returned with search results.
  174. </para>
  175. </listitem>
  176. <listitem>
  177. <para>
  178. <code>UnStored</code> fields are tokenized and indexed, but not stored in the
  179. index. Large amounts of text are best indexed using this type of field. Storing
  180. data creates a larger index on disk, so if you need to search but not redisplay
  181. the data, use an UnStored field. UnStored fields are practical when using a
  182. <classname>Zend_Search_Lucene</classname> index in combination with a relational
  183. database. You can index large data fields with UnStored fields for searching,
  184. and retrieve them from your relational database by using a separate field as an
  185. identifier.
  186. </para>
  187. <table id="zend.search.lucene.index-creation.understanding-field-types.table">
  188. <title>Zend_Search_Lucene_Field Types</title>
  189. <tgroup cols="5">
  190. <thead>
  191. <row>
  192. <entry>Field Type</entry>
  193. <entry>Stored</entry>
  194. <entry>Indexed</entry>
  195. <entry>Tokenized</entry>
  196. <entry>Binary</entry>
  197. </row>
  198. </thead>
  199. <tbody>
  200. <row>
  201. <entry>Keyword</entry>
  202. <entry>Yes</entry>
  203. <entry>Yes</entry>
  204. <entry>No</entry>
  205. <entry>No</entry>
  206. </row>
  207. <row>
  208. <entry>UnIndexed</entry>
  209. <entry>Yes</entry>
  210. <entry>No</entry>
  211. <entry>No</entry>
  212. <entry>No</entry>
  213. </row>
  214. <row>
  215. <entry>Binary</entry>
  216. <entry>Yes</entry>
  217. <entry>No</entry>
  218. <entry>No</entry>
  219. <entry>Yes</entry>
  220. </row>
  221. <row>
  222. <entry>Text</entry>
  223. <entry>Yes</entry>
  224. <entry>Yes</entry>
  225. <entry>Yes</entry>
  226. <entry>No</entry>
  227. </row>
  228. <row>
  229. <entry>UnStored</entry>
  230. <entry>No</entry>
  231. <entry>Yes</entry>
  232. <entry>Yes</entry>
  233. <entry>No</entry>
  234. </row>
  235. </tbody>
  236. </tgroup>
  237. </table>
  238. </listitem>
  239. </itemizedlist>
  240. </sect2>
  241. <sect2 id="zend.search.lucene.index-creation.html-documents">
  242. <title>HTML documents</title>
  243. <para>
  244. <classname>Zend_Search_Lucene</classname> offers a <acronym>HTML</acronym> parsing
  245. feature. Documents can be created directly from a <acronym>HTML</acronym> file or
  246. string:
  247. </para>
  248. <programlisting language="php"><![CDATA[
  249. $doc = Zend_Search_Lucene_Document_Html::loadHTMLFile($filename);
  250. $index->addDocument($doc);
  251. ...
  252. $doc = Zend_Search_Lucene_Document_Html::loadHTML($htmlString);
  253. $index->addDocument($doc);
  254. ]]></programlisting>
  255. <para>
  256. <classname>Zend_Search_Lucene_Document_Html</classname> class uses the
  257. <methodname>DOMDocument::loadHTML()</methodname> and
  258. <methodname>DOMDocument::loadHTMLFile()</methodname> methods to parse the source
  259. <acronym>HTML</acronym>, so it doesn't need <acronym>HTML</acronym> to be well formed or
  260. to be <acronym>XHTML</acronym>. On the other hand, it's sensitive to the encoding
  261. specified by the "meta http-equiv" header tag.
  262. </para>
  263. <para>
  264. <classname>Zend_Search_Lucene_Document_Html</classname> class recognizes document title,
  265. body and document header meta tags.
  266. </para>
  267. <para>
  268. The 'title' field is actually the /html/head/title value. It's stored within the index,
  269. tokenized and available for search.
  270. </para>
  271. <para>
  272. The 'body' field is the actual body content of the <acronym>HTML</acronym> file or
  273. string. It doesn't include scripts, comments or attributes.
  274. </para>
  275. <para>
  276. The <methodname>loadHTML()</methodname> and <methodname>loadHTMLFile()</methodname>
  277. methods of <classname>Zend_Search_Lucene_Document_Html</classname> class also have
  278. second optional argument. If it's set to <constant>TRUE</constant>, then body content is
  279. also stored within index and can be retrieved from the index. By default, the body is
  280. tokenized and indexed, but not stored.
  281. </para>
  282. <para>
  283. The third parameter of <methodname>loadHTML()</methodname> and
  284. <methodname>loadHTMLFile()</methodname> methods optionally specifies source
  285. <acronym>HTML</acronym> document encoding. It's used if encoding is not specified using
  286. Content-type HTTP-EQUIV meta tag.
  287. </para>
  288. <para>
  289. Other document header meta tags produce additional document fields. The field 'name' is
  290. taken from 'name' attribute, and the 'content' attribute populates the field 'value'.
  291. Both are tokenized, indexed and stored, so documents may be searched by their meta tags
  292. (for example, by keywords).
  293. </para>
  294. <para>
  295. Parsed documents may be augmented by the programmer with any other field:
  296. </para>
  297. <programlisting language="php"><![CDATA[
  298. $doc = Zend_Search_Lucene_Document_Html::loadHTML($htmlString);
  299. $doc->addField(Zend_Search_Lucene_Field::UnIndexed('created',
  300. time()));
  301. $doc->addField(Zend_Search_Lucene_Field::UnIndexed('updated',
  302. time()));
  303. $doc->addField(Zend_Search_Lucene_Field::Text('annotation',
  304. 'Document annotation text'));
  305. $index->addDocument($doc);
  306. ]]></programlisting>
  307. <para>
  308. Document links are not included in the generated document, but may be retrieved with
  309. the <methodname>Zend_Search_Lucene_Document_Html::getLinks()</methodname> and
  310. <methodname>Zend_Search_Lucene_Document_Html::getHeaderLinks()</methodname> methods:
  311. </para>
  312. <programlisting language="php"><![CDATA[
  313. $doc = Zend_Search_Lucene_Document_Html::loadHTML($htmlString);
  314. $linksArray = $doc->getLinks();
  315. $headerLinksArray = $doc->getHeaderLinks();
  316. ]]></programlisting>
  317. <para>
  318. Starting from Zend Framework 1.6 it's also possible to exclude links with
  319. <code>rel</code> attribute set to <code>'nofollow'</code>. Use
  320. <methodname>Zend_Search_Lucene_Document_Html::setExcludeNoFollowLinks($true)</methodname>
  321. to turn on this option.
  322. </para>
  323. <para>
  324. <methodname>Zend_Search_Lucene_Document_Html::getExcludeNoFollowLinks()</methodname>
  325. method returns current state of "Exclude nofollow links" flag.
  326. </para>
  327. </sect2>
  328. <sect2 id="zend.search.lucene.index-creation.docx-documents">
  329. <title>Word 2007 documents</title>
  330. <para>
  331. <classname>Zend_Search_Lucene</classname> offers a Word 2007 parsing feature. Documents
  332. can be created directly from a Word 2007 file:
  333. </para>
  334. <programlisting language="php"><![CDATA[
  335. $doc = Zend_Search_Lucene_Document_Docx::loadDocxFile($filename);
  336. $index->addDocument($doc);
  337. ]]></programlisting>
  338. <para>
  339. <classname>Zend_Search_Lucene_Document_Docx</classname> class uses the
  340. <code>ZipArchive</code> class and <code>simplexml</code> methods to parse the source
  341. document. If the <code>ZipArchive</code> class (from module php_zip) is not available,
  342. the <classname>Zend_Search_Lucene_Document_Docx</classname> will also not be available
  343. for use with Zend Framework.
  344. </para>
  345. <para>
  346. <classname>Zend_Search_Lucene_Document_Docx</classname> class recognizes document meta
  347. data and document text. Meta data consists, depending on document contents, of filename,
  348. title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
  349. created.
  350. </para>
  351. <para>
  352. The 'filename' field is the actual Word 2007 file name.
  353. </para>
  354. <para>
  355. The 'title' field is the actual document title.
  356. </para>
  357. <para>
  358. The 'subject' field is the actual document subject.
  359. </para>
  360. <para>
  361. The 'creator' field is the actual document creator.
  362. </para>
  363. <para>
  364. The 'keywords' field contains the actual document keywords.
  365. </para>
  366. <para>
  367. The 'description' field is the actual document description.
  368. </para>
  369. <para>
  370. The 'lastModifiedBy' field is the username who has last modified the actual document.
  371. </para>
  372. <para>
  373. The 'revision' field is the actual document revision number.
  374. </para>
  375. <para>
  376. The 'modified' field is the actual document last modified date / time.
  377. </para>
  378. <para>
  379. The 'created' field is the actual document creation date / time.
  380. </para>
  381. <para>
  382. The 'body' field is the actual body content of the Word 2007 document. It only includes
  383. normal text, comments and revisions are not included.
  384. </para>
  385. <para>
  386. The <methodname>loadDocxFile()</methodname> methods of
  387. <classname>Zend_Search_Lucene_Document_Docx</classname> class also have second optional
  388. argument. If it's set to <constant>TRUE</constant>, then body content is also stored
  389. within index and can be retrieved from the index. By default, the body is tokenized and
  390. indexed, but not stored.
  391. </para>
  392. <para>
  393. Parsed documents may be augmented by the programmer with any other field:
  394. </para>
  395. <programlisting language="php"><![CDATA[
  396. $doc = Zend_Search_Lucene_Document_Docx::loadDocxFile($filename);
  397. $doc->addField(Zend_Search_Lucene_Field::UnIndexed(
  398. 'indexTime',
  399. time())
  400. );
  401. $doc->addField(Zend_Search_Lucene_Field::Text(
  402. 'annotation',
  403. 'Document annotation text')
  404. );
  405. $index->addDocument($doc);
  406. ]]></programlisting>
  407. </sect2>
  408. <sect2 id="zend.search.lucene.index-creation.pptx-documents">
  409. <title>Powerpoint 2007 documents</title>
  410. <para>
  411. <classname>Zend_Search_Lucene</classname> offers a Powerpoint 2007 parsing feature.
  412. Documents can be created directly from a Powerpoint 2007 file:
  413. </para>
  414. <programlisting language="php"><![CDATA[
  415. $doc = Zend_Search_Lucene_Document_Pptx::loadPptxFile($filename);
  416. $index->addDocument($doc);
  417. ]]></programlisting>
  418. <para>
  419. <classname>Zend_Search_Lucene_Document_Pptx</classname> class uses the
  420. <code>ZipArchive</code> class and <code>simplexml</code> methods to parse the source
  421. document. If the <code>ZipArchive</code> class (from module php_zip) is not available,
  422. the <classname>Zend_Search_Lucene_Document_Pptx</classname> will also not be available
  423. for use with Zend Framework.
  424. </para>
  425. <para>
  426. <classname>Zend_Search_Lucene_Document_Pptx</classname> class recognizes document meta
  427. data and document text. Meta data consists, depending on document contents, of filename,
  428. title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
  429. created.
  430. </para>
  431. <para>
  432. The 'filename' field is the actual Powerpoint 2007 file name.
  433. </para>
  434. <para>
  435. The 'title' field is the actual document title.
  436. </para>
  437. <para>
  438. The 'subject' field is the actual document subject.
  439. </para>
  440. <para>
  441. The 'creator' field is the actual document creator.
  442. </para>
  443. <para>
  444. The 'keywords' field contains the actual document keywords.
  445. </para>
  446. <para>
  447. The 'description' field is the actual document description.
  448. </para>
  449. <para>
  450. The 'lastModifiedBy' field is the username who has last modified the actual document.
  451. </para>
  452. <para>
  453. The 'revision' field is the actual document revision number.
  454. </para>
  455. <para>
  456. The 'modified' field is the actual document last modified date / time.
  457. </para>
  458. <para>
  459. The 'created' field is the actual document creation date / time.
  460. </para>
  461. <para>
  462. The 'body' field is the actual content of all slides and slide notes in the Powerpoint
  463. 2007 document.
  464. </para>
  465. <para>
  466. The <methodname>loadPptxFile()</methodname> methods of
  467. <classname>Zend_Search_Lucene_Document_Pptx</classname> class also have second optional
  468. argument. If it's set to <constant>TRUE</constant>, then body content is also stored
  469. within index and can be retrieved from the index. By default, the body is tokenized and
  470. indexed, but not stored.
  471. </para>
  472. <para>
  473. Parsed documents may be augmented by the programmer with any other field:
  474. </para>
  475. <programlisting language="php"><![CDATA[
  476. $doc = Zend_Search_Lucene_Document_Pptx::loadPptxFile($filename);
  477. $doc->addField(Zend_Search_Lucene_Field::UnIndexed(
  478. 'indexTime',
  479. time()));
  480. $doc->addField(Zend_Search_Lucene_Field::Text(
  481. 'annotation',
  482. 'Document annotation text'));
  483. $index->addDocument($doc);
  484. ]]></programlisting>
  485. </sect2>
  486. <sect2 id="zend.search.lucene.index-creation.xlsx-documents">
  487. <title>Excel 2007 documents</title>
  488. <para>
  489. <classname>Zend_Search_Lucene</classname> offers a Excel 2007 parsing feature. Documents
  490. can be created directly from a Excel 2007 file:
  491. </para>
  492. <programlisting language="php"><![CDATA[
  493. $doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($filename);
  494. $index->addDocument($doc);
  495. ]]></programlisting>
  496. <para>
  497. <classname>Zend_Search_Lucene_Document_Xlsx</classname> class uses the
  498. <code>ZipArchive</code> class and <code>simplexml</code> methods to parse the source
  499. document. If the <code>ZipArchive</code> class (from module php_zip) is not available,
  500. the <classname>Zend_Search_Lucene_Document_Xlsx</classname> will also not be available
  501. for use with Zend Framework.
  502. </para>
  503. <para>
  504. <classname>Zend_Search_Lucene_Document_Xlsx</classname> class recognizes document meta
  505. data and document text. Meta data consists, depending on document contents, of filename,
  506. title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
  507. created.
  508. </para>
  509. <para>
  510. The 'filename' field is the actual Excel 2007 file name.
  511. </para>
  512. <para>
  513. The 'title' field is the actual document title.
  514. </para>
  515. <para>
  516. The 'subject' field is the actual document subject.
  517. </para>
  518. <para>
  519. The 'creator' field is the actual document creator.
  520. </para>
  521. <para>
  522. The 'keywords' field contains the actual document keywords.
  523. </para>
  524. <para>
  525. The 'description' field is the actual document description.
  526. </para>
  527. <para>
  528. The 'lastModifiedBy' field is the username who has last modified the actual document.
  529. </para>
  530. <para>
  531. The 'revision' field is the actual document revision number.
  532. </para>
  533. <para>
  534. The 'modified' field is the actual document last modified date / time.
  535. </para>
  536. <para>
  537. The 'created' field is the actual document creation date / time.
  538. </para>
  539. <para>
  540. The 'body' field is the actual content of all cells in all worksheets of the Excel 2007
  541. document.
  542. </para>
  543. <para>
  544. The <methodname>loadXlsxFile()</methodname> methods of
  545. <classname>Zend_Search_Lucene_Document_Xlsx</classname> class also have second optional
  546. argument. If it's set to <constant>TRUE</constant>, then body content is also stored
  547. within index and can be retrieved from the index. By default, the body is tokenized and
  548. indexed, but not stored.
  549. </para>
  550. <para>
  551. Parsed documents may be augmented by the programmer with any other field:
  552. </para>
  553. <programlisting language="php"><![CDATA[
  554. $doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($filename);
  555. $doc->addField(Zend_Search_Lucene_Field::UnIndexed(
  556. 'indexTime',
  557. time()));
  558. $doc->addField(Zend_Search_Lucene_Field::Text(
  559. 'annotation',
  560. 'Document annotation text'));
  561. $index->addDocument($doc);
  562. ]]></programlisting>
  563. </sect2>
  564. </sect1>