boarspring
/
repo_zf1
mirrorاز https://github.com/bluescreen/zf1-php7.2.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690
							<sect1 id="zend.search.lucene.extending">
    <title>Расширяемость<!--Extensibility--></title>

    <sect2 id="zend.search.lucene.extending.analysis">
        <title>Анализ текста<!--Text Analysis--></title>
        <para>
            Класс <code>Zend_Search_Lucene_Analysis_Analyzer</code> используется
            индексатором для разбиения текстовых полей документа на лексемы.
<!--
            <code>Zend_Search_Lucene_Analysis_Analyzer</code> class is used by indexer to tokenize document
            text fields.
-->
        </para>

        <para>
            Методы
            <code>Zend_Search_Lucene_Analysis_Analyzer::getDefault()</code> и
            <code>Zend_Search_Lucene_Analysis_Analyzer::setDefault()</code>
            используются для получения и установки анализатора по умолчанию.
<!--
            <code>Zend_Search_Lucene_Analysis_Analyzer::getDefault()</code> and <code>
            Zend_Search_Lucene_Analysis_Analyzer::setDefault()</code> methods are used
            to get and set default analyzer.
-->
        </para>

        <para>
            Таким образом, вы можете устанавливать собственный анализатор текста
            или выбирать его из ряда предопределенных анализаторов:
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text</code> и
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive</code>
            (по умолчанию). Оба интерпретируют лексему как последовательность
            букв. <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive</code>
            приводит лексемы к нижнему регистру.
<!--
            Thus you can assign your own text analyzer or choose it from the set of predefined analyzers:
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text</code> and
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive</code> (default).
            Both of them interpret token as a sequence of letters.
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive</code>
            converts tokens to lower case.
-->
        </para>

        <para>
            Переключение между анализаторами:
<!--
            To switch between analyzers:
-->
        </para>

        <programlisting language="php"><![CDATA[<?php
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new Zend_Search_Lucene_Analysis_Analyzer_Common_Text());
...
$index->addDocument($doc);]]></programlisting>

        <para>
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common</code>
            спроектирован для того, чтобы быть родительским классом для всех
            анализаторов, определяемых пользователем. В наследующем классе
            достаточно определить методы <code>reset()</code> и
            <code>nextToken()</code>, которые берут строку из свойства $_input
            и возвращают лексемы шаг за шагом (<constant>NULL</constant> означает конец
            потока).
<!--
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common</code> is designed to be a parent for all user
            defined analyzers. User should only define the <code>reset()</code> and <code>nextToken()</code> methods,
            which take string from $_input member and returns tokens step by step
            (<constant>NULL</constant> indicates end of stream).
-->
        </para>

        <para>
            Метод <code>nextToken()</code> должен использовать метод
            <code>normalize()</code> для каждой лексемы. Это позволит
            использовать фильтры лексем с вашим анализатором.
<!--
            The <code>nextToken()</code> method should apply the <code>normalize()</code> method to each
            token. It will allow you to use token filters with your analyzer.
-->
        </para>

        <para>
            Ниже приведен пример пользовательского анализатора, котрорый
            принимает слова с цифрами как элементы:
<!--
            Here is an example of a custom Analyzer, which takes words with digits as terms:
-->

            <example>
                <title>Собственный анализатор текста<!--Custom text Analyzer.--></title>
                <programlisting language="php"><![CDATA[<?php
/** Это созданный пользователем текстовый анализатор, который интерпретирует слова с цифрами как один элемент. */

/** Zend_Search_Lucene_Analysis_Analyzer hierarchy */
require_once 'Zend/Search/Lucene/Analysis/Analyzer.php';

class My_Analyzer extends Zend_Search_Lucene_Analysis_Analyzer_Common
{
    private $_position;

    /**
     * Установка позиции в начальное состояние
     */
    public function reset()
    {
        $this->_position = 0;
    }

    /**
     * API для разбиения на лексемы
     * Получение следующей лексемы
     * Возвращает null, если достигнут конец потока
     *
     * @return Zend_Search_Lucene_Analysis_Token|null
     */
    public function nextToken()
    {
        if ($this->_input === null) {
            return null;
        }

        while ($this->_position < strlen($this->_input)) {
            // пропуск пробела
            while ($this->_position < strlen($this->_input) &&
                   !ctype_alnum( $this->_input[$this->_position] )) {
                $this->_position++;
            }

            $termStartPosition = $this->_position;

            // чтение лексемы
            while ($this->_position < strlen($this->_input) &&
                   ctype_alnum( $this->_input[$this->_position] )) {
                $this->_position++;
            }

            // Пустая лексема, конец потока
            if ($this->_position == $termStartPosition) {
                return null;
            }

            $token = new Zend_Search_Lucene_Analysis_Token(
                                      substr($this->_input,
                                             $termStartPosition,
                                             $this->_position - $termStartPosition),
                                      $termStartPosition,
                                      $this->_position);
            $token = $this->normalize($token);
            if ($token !== null) {
                return $token;
            }
            // Продолжение, если лексема пропущена
        }

        return null;
    }
}

Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new My_Analyzer());]]></programlisting>
            </example>
        </para>
    </sect2>

    <sect2 id="zend.search.lucene.extending.filters">
        <title>Фильтрация лексем<!--Tokens Filtering--></title>
        <para>
            Анализатор <code>Zend_Search_Lucene_Analysis_Analyzer_Common</code>
            также предоставляет механизм фильтрации лексем.
<!--
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common</code> analyzer also offers tokens filtering
            mechanism.
-->
        </para>

        <para>
            Класс <code>Zend_Search_Lucene_Analysis_TokenFilter</code> является
            уровнем абстракции для таких фильтров. Он должен использоваться как
            предок для ваших собственных фильтров.
<!--
            <code>Zend_Search_Lucene_Analysis_TokenFilter</code> class is an abstract level for such filters.
            It should be used as an ancestor for your own filters.
-->
        </para>

        <para>
            Пользовательские фильтры должны реализовать метод
            <code>normalize()</code>, который может преобразовывать лексему или
            сигнализировать, что лексема должна быть пропущена.
<!--
            Custom filter must implement <code>normalize()</code> method which may transform input token or signal that
            token should be skipped.
-->
        </para>

        <para>
            В предоставляемом анализаторе уже определены три фильтра:
<!--
            There are three filters already defined in Analysis subpackage:
-->
            <itemizedlist>
                <listitem>
                    <para>
                        <code>Zend_Search_Lucene_Analysis_TokenFilter_LowerCase</code>
<!--
                        <code>Zend_Search_Lucene_Analysis_TokenFilter_LowerCase</code> filter.
-->
                    </para>
                </listitem>
                <listitem>
                    <para>
                        <code>Zend_Search_Lucene_Analysis_TokenFilter_ShortWords</code>
<!--
                        <code>Zend_Search_Lucene_Analysis_TokenFilter_ShortWords</code> filter.
-->
                    </para>
                </listitem>
                <listitem>
                    <para>
                        <code>Zend_Search_Lucene_Analysis_TokenFilter_StopWords</code>
<!--
                        <code>Zend_Search_Lucene_Analysis_TokenFilter_StopWords</code> filter.
-->
                    </para>
                </listitem>
            </itemizedlist>
        </para>

        <para>
            Фильтр <code>LowerCase</code> уже используется для анализатора
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive</code>,
            который применяется по умолчанию.
<!--
            <code>LowerCase</code> filter is already used for
            <code>Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive</code> analyzer
            which is default.
-->
        </para>

        <para>
            <code>ShortWords</code> и <code>StopWords</code> могут
            использоваться с уже включенными анализаторами или вашими
            собственными:
<!--
            <code>ShortWords</code> and <code>StopWords</code> may be used with already defined or your own
            analyzers like this:
-->
        </para>

        <programlisting language="php"><![CDATA[<?php
$stopWords = array('a', 'an', 'at', 'the', 'and', 'or', 'is', 'am');
$stopWordsFilter = new Zend_Search_Lucene_Analysis_TokenFilter_StopWords($stopWords);

$analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();
$analyzer->addFilter($stopWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);]]></programlisting>

        <programlisting language="php"><![CDATA[<?php
$shortWordsFilter = new Zend_Search_Lucene_Analysis_TokenFilter_ShortWords();

$analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();
$analyzer->addFilter($shortWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);]]></programlisting>

        <para>
            Конструктор
            <code>Zend_Search_Lucene_Analysis_TokenFilter_StopWords</code>
            принимает массив стоп-слов в качестве аргумента. Но стоп-слова можно
            также загружать и из файла:
<!--
            <code>Zend_Search_Lucene_Analysis_TokenFilter_StopWords</code> constructor takes an array of stop-words
            as an input. But stop-words may be also loaded from a file:
-->
        </para>

        <programlisting language="php"><![CDATA[<?php
$stopWordsFilter = new Zend_Search_Lucene_Analysis_TokenFilter_StopWords();
$stopWordsFilter->loadFromFile($my_stopwords_file);

$analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();
$analyzer->addFilter($stopWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);]]></programlisting>

        <para>
            Файл должен быть текстовым с одним словом в каждой строке. Символом
            '#' помечаются строки с комментариями.
<!--
            File should be a common text file with one word in each string. '#' marks string as a comment.
-->
        </para>

        <para>
            Конструктор
            <code>Zend_Search_Lucene_Analysis_TokenFilter_ShortWords</code>
            имеет один необязательный параметр, это ограничение длины слова.
            Его значение по умолчанию равно 2.
<!--
            <code>Zend_Search_Lucene_Analysis_TokenFilter_ShortWords</code> constructor has one optional argument.
            It's a words length limit. Default value is 2.
-->
        </para>

    </sect2>

    <sect2 id="zend.search.lucene.extending.scoring">
        <title>Алгоритмы ранжирования<!--Scoring Algorithms--></title>
        <para>
            Ранг <literal>q</literal> документа
            <literal>d</literal> определяется следующим образом:
<!--
            The score of query <literal>q</literal> for document <literal>d</literal>
            is defined as follows:
-->
        </para>

        <para>
            <code>score(q,d) = sum( tf(t in d) * idf(t) * getBoost(t.field in d) * lengthNorm(t.field in d)  ) *
            coord(q,d) * queryNorm(q)</code>
        </para>

        <para>
            tf(t in d) - <code>Zend_Search_Lucene_Search_Similarity::tf($freq)</code>
            - коэффициент ранга, основанный на том, насколько часто встречается
            элемент или фраза в документе.
<!--
            tf(t in d) - <code>Zend_Search_Lucene_Search_Similarity::tf($freq)</code>
            - a score factor based on a term or phrase's frequency in a document.
-->
        </para>

        <para>
            idf(t) - <code>Zend_Search_Lucene_Search_SimilaritySimilarity::tf($term, $reader)</code>
            - коэффициент ранга для простого элемента применительно к определенному
            индексу.
<!--
            idf(t) - <code>Zend_Search_Lucene_Search_SimilaritySimilarity::tf($term, $reader)</code>
            - a score factor for a simple term for the specified index.
-->
        </para>

        <para>
            getBoost(t.field in d) - коэффициент усиления для поля элемента.
<!--
            getBoost(t.field in d) - boost factor for the term field.
-->
        </para>

        <para>
            lengthNorm($term) - значение нормализации для поля, получаемое из
            общего количества элементов, содержащихся в поле. Это значение
            хранится внутри индекса. Эти значения вместе с коэффициентом усиления поля
            хранятся в индексе, результатом их умножения является
            ранг для каждого поля.
<!--
            lengthNorm($term) - the normalization value for a field given the total
            number of terms contained in a field. This value is stored within the index.
            These values, together with field boosts, are stored in an index and multiplied
            into scores for hits on each field by the search code.
-->
        </para>
        <para>
            Совпадения в длинных полях менее точны, поэтому реализации этого метода
            обычно возвращают тем меньшие значения, чем
            больше число лексем, и тем большие значения, чем меньше число лексем.
<!--
            Matches in longer fields are less precise, so implementations of this method
            usually return smaller values when numTokens is large, and larger values when numTokens is small.
-->
        </para>

        <para>
            сoord(q,d) - <code>Zend_Search_Lucene_Search_Similarity::coord($overlap, $maxOverlap)</code>
            - коэффициент ранга, основанный на относительной доле всех элементов запроса,
            найденных в документе.
<!--
            coord(q,d) - <code>Zend_Search_Lucene_Search_Similarity::coord($overlap, $maxOverlap)</code> - a
            score factor based on the fraction of all query terms that a document contains.
-->
        </para>

        <para>
            Присутствие большого количества элементов запроса означает лучшее
            соответствие запросу, поэтому реализации этого метода обычно возвращают
            бОльшие значения, когда соотношение между этими параметрами большое
            и меньшие значения, когда соотношение между ними небольшое.
<!--
            The presence of a large portion of the query terms indicates a better match
            with the query, so implementations of this method usually return larger values
            when the ratio between these parameters is large and smaller values when
            the ratio between them is small.
-->
        </para>

        <para>
            queryNorm(q) - значение нормализации для запроса, получаемое из суммы
            возведенных в квадрат весов каждого из элементов запроса. Это значение
            затем умножается в вес каждого элемента запроса.
<!--
            queryNorm(q) -  the normalization value for a query given the sum of the squared weights
            of each of the query terms. This value is then multiplied into the weight of each query
            term.
-->
        </para>

        <para>
            Это не влияет на ранжирование, цель нормализации состоит в том, чтобы
            сделать соизмеримыми ранги, полученные при различных запросах.
<!--
            This does not affect ranking, but rather just attempts to make scores from different
            queries comparable.
-->
        </para>

        <para>
            Алгоритм ранжирования может быть изменен через определение своего
            собственного класса. Для этого надо создать потомка класса
            Zend_Search_Lucene_Search_Similarity, как показано ниже, затем
            использовать метод
            <code>Zend_Search_Lucene_Search_Similarity::setDefault($similarity);</code>
            для установки объекта как используемого по умолчанию.
<!--
            Scoring algorithm can be customized by defining your own Similarity class. To do this
            extend Zend_Search_Lucene_Search_Similarity class as defined below, then use
            <code>Zend_Search_Lucene_Search_Similarity::setDefault($similarity);</code> method to set it as default.
-->
        </para>

        <programlisting language="php"><![CDATA[<?php

class MySimilarity extends Zend_Search_Lucene_Search_Similarity {
    public function lengthNorm($fieldName, $numTerms) {
        return 1.0/sqrt($numTerms);
    }

    public function queryNorm($sumOfSquaredWeights) {
        return 1.0/sqrt($sumOfSquaredWeights);
    }

    public function tf($freq) {
        return sqrt($freq);
    }

    /**
     * Сейчас не используется. Подсчитывает сумму соответствий неточной фразе,
     * основанную на изменяемом расстоянии.
     */
    public function sloppyFreq($distance) {
        return 1.0;
    }

    public function idfFreq($docFreq, $numDocs) {
        return log($numDocs/(float)($docFreq+1)) + 1.0;
    }

    public function coord($overlap, $maxOverlap) {
        return $overlap/(float)$maxOverlap;
    }
}

$mySimilarity = new MySimilarity();
Zend_Search_Lucene_Search_Similarity::setDefault($mySimilarity);

?>]]>  </programlisting>
    </sect2>

    <sect2 id="zend.search.lucene.extending.storage">
        <title>Контейнеры хранения<!--Storage Containers--></title>
        <para>
        Абстрактный класс <code>Zend_Search_Lucene_Storage_Directory</code>
        определяет функционал директории.
<!--
        An abstract class <code>Zend_Search_Lucene_Storage_Directory</code> defines
        directory functionality.
-->
        </para>

        <para>
        Конструктор <code>Zend_Search_Lucene</code> использует строку или
        объект <code>Zend_Search_Lucene_Storage_Directory</code> как входные данные.
<!--
        The <code>Zend_Search_Lucene</code> constructor uses either a string or a
        <code>Zend_Search_Lucene_Storage_Directory</code> object
        as an input.
-->
        </para>

        <para>
        <code>Zend_Search_Lucene_Storage_Directory_Filesystem</code> реализует
        функционал директории для файловой системы.
<!--
        <code>Zend_Search_Lucene_Storage_Directory_Filesystem</code> class implements directory
        functionality for file system.
-->
        </para>

        <para>
        Если для конструктора <code>Zend_Search_Lucene</code> в качестве входных
        данных испольуется строка, то считыватель индекса (объект
        <code>Zend_Search_Lucene</code>) рассматривает ее как путь в файловой
        системе и сама инстанцирует объекты
        <code>Zend_Search_Lucene_Storage_Directory_Filesystem</code>.
<!--
        If a string is used as an input for the <code>Zend_Search_Lucene</code> constructor, then the index
        reader (<code>Zend_Search_Lucene</code> object) treats it as a file system path and instantiates
        <code>Zend_Search_Lucene_Storage_Directory_Filesystem</code> objects by themselves.
-->
        </para>

        <para>
        Вы можете определить собственную реализацию директории,
        создав потомка класса <code>Zend_Search_Lucene_Storage_Directory</code>.
<!--
        You can define your own directory implementation by extending the
        <code>Zend_Search_Lucene_Storage_Directory</code> class.
-->
        </para>

        <para>
        Методы <code>Zend_Search_Lucene_Storage_Directory</code>:
<!--
        <code>Zend_Search_Lucene_Storage_Directory</code> methods:
-->
        </para>

        <programlisting><![CDATA[<?php

abstract class Zend_Search_Lucene_Storage_Directory {
/**
 * Закрывает средство хранения.
 *
 * @return void
 */
abstract function close();


/**
 * Создает новый пустой файл с данным именем в директории.
 *
 * @param string $name
 * @return void
 */
abstract function createFile($filename);


/**
 * Удаляет существующий файл в директории.
 *
 * @param string $filename
 * @return void
 */
abstract function deleteFile($filename);


/**
 * Возвращает true, если файл с данным именем существует.
 *
 * @param string $filename
 * @return boolean
 */
abstract function fileExists($filename);


/**
 * Возвращает длину файла в директории.
 *
 * @param string $filename
 * @return integer
 */
abstract function fileLength($filename);


/**
 * Возвращает время последнего изменения файла в формате UNIX.
 *
 * @param string $filename
 * @return integer
 */
abstract function fileModified($filename);


/**
 * Переименовывает существующий файл в директории.
 *
 * @param string $from
 * @param string $to
 * @return void
 */
abstract function renameFile($from, $to);


/**
 * Устанавливает время изменения файла в текущее.
 *
 * @param string $filename
 * @return void
 */
abstract function touchFile($filename);


/**
 * Возвращает объект Zend_Search_Lucene_Storage_File для данного файла в директории.
 *
 * @param string $filename
 * @return Zend_Search_Lucene_Storage_File
 */
abstract function getFileObject($filename);

}

?>]]></programlisting>

        <para>
        Метод <code>getFileObject($filename)</code> класса
        <code>Zend_Search_Lucene_Storage_Directory</code> возвращает
        объект <code>Zend_Search_Lucene_Storage_File</code>.
<!--
        <code>getFileObject($filename)</code> method of <code>Zend_Search_Lucene_Storage_Directory</code>
        class returns a <code>Zend_Search_Lucene_Storage_File</code> object.
-->
        </para>
        <para>
        Абстрактный класс <code>Zend_Search_Lucene_Storage_File</code> реализует
        абстракцию файла и примитивы чтения файла индекса.
<!--
        <code>Zend_Search_Lucene_Storage_File</code> abstract class implements file abstraction and index
        file reading primitives.
-->
        </para>
        <para>
        Вы должны создать класс, наследующий от <code>Zend_Search_Lucene_Storage_File</code>
        для своей реализации директории.
<!--
        You must also extend <code>Zend_Search_Lucene_Storage_File</code> for your Directory implementation.
-->
        </para>
        <para>
        Только два метода класса <code>Zend_Search_Lucene_Storage_File</code> должны быть
        перегружены в вашей реализации:
<!--
        Only two methods of <code>Zend_Search_Lucene_Storage_File</code> must be overloaded in your
        implementation:
-->
        </para>

        <programlisting><![CDATA[<?php

class MyFile extends Zend_Search_Lucene_Storage_File {
    /**
     * Устанавливает индикатор позиции и перемещает указатель файла.
     * Новая позиция, измеряемая в байтах от начала файла,
     * получается добавлением смещения к позиции, определяемой аргументом $whence,
     * который может принимать следующие значения:
     * SEEK_SET - Устанавливает позицию равной смещению в байтах.
     * SEEK_CUR - Устанавливает позицию равной текущей позиции плюс смещение.
     * SEEK_END - Устанавливает позицию равной концу файла плюс смещение.
     * (Для перемещения позиции относительно конца файла вы должны передать отрицательное значение смещения)
     * В случае успеха возвращает 0; иначе -1
     *
     * @param integer $offset
     * @param integer $whence
     * @return integer
     */
    public function seek($offset, $whence=SEEK_SET) {
        ...
    }

    /**
     * Считывает $length байт из файла и перемещает указатель файла.
     *
     * @param integer $length
     * @return string
     */
    protected function _fread($length=1) {
        ...
    }
}

?>]]></programlisting>

    </sect2>
</sect1>

<!--
vim:se ts=4 sw=4 et:
-->