Indexing
Indexing is performed by adding a new document to an existing or new index:
addDocument($doc);
]]>
There are two ways to create document object. The first is to do it manually.
Manual Document Construction
addField(Zend_Search_Lucene_Field::Text('url', $docUrl));
$doc->addField(Zend_Search_Lucene_Field::Text('title', $docTitle));
$doc->addField(Zend_Search_Lucene_Field::unStored('contents', $docBody));
$doc->addField(Zend_Search_Lucene_Field::binary('avatar', $avatarData));
]]>
The second method is to load it from HTML or Microsoft Office 2007 files:
Document loading
If a document is loaded from one of the supported formats, it still can be extended manually
with new user defined fields.
Indexing Policy
You should define indexing policy within your application architectural design.
You may need an on-demand indexing configuration (something like OLTP
system). In such systems, you usually add one document per user request. As such, the
MaxBufferedDocs option will not affect the system. On the other
hand, MaxMergeDocs is really helpful as it allows you to limit
maximum script execution time. MergeFactor should be set to a value
that keeps balance between the average indexing time (it's also affected by average
auto-optimization time) and search performance (index optimization level is dependent on
the number of segments).
If you will be primarily performing batch index updates, your configuration should use a
MaxBufferedDocs option set to the maximum value supported by the
available amount of memory. MaxMergeDocs and
MergeFactor have to be set to values reducing auto-optimization
involvement as much as possible An additional limit is the maximum file
handlers supported by the operation system for concurrent open
operations. Full index optimization should be applied after
indexing.
Index optimization
optimize();
]]>
In some configurations, it's more effective to serialize index updates by organizing
update requests into a queue and processing several update requests in a single script
execution. This reduces index opening overhead, and allows utilizing index document
buffering.