Zend_Search_Lucene-JavaLucene.xml 2.9 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!-- Reviewed: no -->
  3. <sect1 id="zend.search.lucene.java-lucene">
  4. <title>Interoperating with Java Lucene</title>
  5. <sect2 id="zend.search.lucene.index-creation.file-formats">
  6. <title>File Formats</title>
  7. <para>
  8. <classname>Zend_Search_Lucene</classname> index file formats are binary compatible with
  9. Java Lucene version 1.4 and greater.
  10. </para>
  11. <para>
  12. A detailed description of this format is available here:
  13. <ulink url="http://lucene.apache.org/java/2_3_0/fileformats.html"/>
  14. <footnote>
  15. <para>
  16. The currently supported Lucene index file format version is 2.3 (starting from
  17. Zend Framework 1.6).
  18. </para>
  19. </footnote>.
  20. </para>
  21. </sect2>
  22. <sect2 id="zend.search.lucene.index-creation.index-directory">
  23. <title>Index Directory</title>
  24. <para>
  25. After index creation, the index directory will contain several files:
  26. </para>
  27. <itemizedlist>
  28. <listitem>
  29. <para>
  30. The <filename>segments</filename> file is a list of index segments.
  31. </para>
  32. </listitem>
  33. <listitem>
  34. <para>
  35. The <filename>*.cfs</filename> files contain index segments.
  36. Note! An optimized index always has only one segment.
  37. </para>
  38. </listitem>
  39. <listitem>
  40. <para>
  41. The <filename>deletable</filename> file is a list of files that are no longer
  42. used by the index, but which could not be deleted.
  43. </para>
  44. </listitem>
  45. </itemizedlist>
  46. </sect2>
  47. <sect2 id="zend.search.lucene.java-lucene.source-code">
  48. <title>Java Source Code</title>
  49. <para>
  50. The Java program listing below provides an example of how to index a file
  51. using Java Lucene:
  52. </para>
  53. <programlisting language="java"><![CDATA[
  54. /**
  55. * Index creation:
  56. */
  57. import org.apache.lucene.index.IndexWriter;
  58. import org.apache.lucene.document.*;
  59. import java.io.*
  60. ...
  61. IndexWriter indexWriter = new IndexWriter("/data/my_index",
  62. new SimpleAnalyzer(), true);
  63. ...
  64. String filename = "/path/to/file-to-index.txt"
  65. File f = new File(filename);
  66. Document doc = new Document();
  67. doc.add(Field.Text("path", filename));
  68. doc.add(Field.Keyword("modified",DateField.timeToString(f.lastModified())));
  69. doc.add(Field.Text("author", "unknown"));
  70. FileInputStream is = new FileInputStream(f);
  71. Reader reader = new BufferedReader(new InputStreamReader(is));
  72. doc.add(Field.Text("contents", reader));
  73. indexWriter.addDocument(doc);
  74. ]]></programlisting>
  75. </sect2>
  76. </sect1>
  77. <!--
  78. vim:se ts=4 sw=4 et:
  79. -->