Browse Source

[DOCUMENTATION] French: new file: Zend_Search_Lucene-QueryLanguage.xml

git-svn-id: http://framework.zend.com/svn/framework/standard/trunk@20445 44c647ce-9c0f-0410-b52a-842ac1e357ba
kara 16 years ago
parent
commit
1a142ff1b9

+ 395 - 0
documentation/manual/fr/module_specs/Zend_Search_Lucene-QueryLanguage.xml

@@ -0,0 +1,395 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!-- EN-Revision: 18536 -->
+<!-- Reviewed: no -->
+<sect1 id="zend.search.lucene.query-language">
+    <title>Langage de requêtes</title>
+    <para>
+        Java Lucene et <classname>Zend_Search_Lucene</classname> fournissent des langages de requêtes plutôt puissants.
+    </para>
+    <para>
+        Ces langages sont pratiquement pareils, exceptées les quelques différences ci-dessous.
+    </para>
+    <para>
+        La syntaxe complète du langage de requêtes Java Lucene peut être trouvée
+        <ulink url="http://lucene.apache.org/java/2_3_0/queryparsersyntax.html">ici</ulink>.
+    </para>
+    <sect2 id="zend.search.lucene.query-language.terms">
+        <title>Termes</title>
+        <para>
+            Une requête est décomposée en termes et opérateurs. Il y a 3 types de termes : le termes simples, les
+            phrases et les sous-requêtes.
+        </para>
+        <para>
+            Un terme simple est un simple mot, tel que "test" ou "hello".
+        </para>
+        <para>
+            Une phrase est un groupe de mots inclus dans des double guillemets, tel que "hello dolly".
+        </para>
+        <para>
+            Une sous-requête est une requête incluse dans des parenthèses, tel que "(hello dolly)".
+        </para>
+        <para>
+            De multiples termes peuvent être combinés ensemble avec des opérateurs booléens pour former
+            des requêtes complexes (voyez ci-dessous).
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.fields">
+        <title>Champs</title>
+        <para>
+            Lucene supporte les champs de données. Lorsque vous effectuez une recherche, vous pouvez soit
+            spécifier un champ, soit utiliser le champ par défaut. Le nom du champ dépend des données indexées
+            et le champ par défaut est défini par les paramètres courants.
+        </para>
+        <para>
+            La première différence et la plus significative avec Java Lucene est que par défaut les termes
+            sont cherchés dans <emphasis>tous les champs</emphasis>.
+        </para>
+        <para>
+            Il y a deux méthodes statiques dans la classe <classname>Zend_Search_Lucene</classname> qui
+            permettent au développeur de configurer ces paramètres :
+        </para>
+        <programlisting language="php"><![CDATA[
+$defaultSearchField = Zend_Search_Lucene::getDefaultSearchField();
+...
+Zend_Search_Lucene::setDefaultSearchField('contents');
+]]></programlisting>
+        <para>
+            La valeur <constant>NULL</constant> indique que la recherche est effectuée dans tous les champs. C'est
+            le paramétrage par défaut
+        </para>
+        <para>
+            Vous pouvez chercher dans des champs spécifiques en tapant le nom du champ suivi de ":", suivi du terme
+            que vous cherchez.
+        </para>
+        <para>
+            Par exemple, prenons un index Lucene contenant deux champs -title et text- avec text comme champ par défaut.
+            Si vous voulez trouver le document ayant pour titre "The Right Way" qui contient le text "don't go this way",
+            vous pouvez entrer :
+        </para>
+        <programlisting language="querystring"><![CDATA[
+title:"The Right Way" AND text:go
+]]></programlisting>
+        <para>
+            or
+        </para>
+        <programlisting language="querystring"><![CDATA[
+title:"Do it right" AND go
+]]></programlisting>
+        <para>
+            "text" étant le champ par défaut, l'indicateur de champ n'est pas requis.
+        </para>
+        <para>
+            Note: Le champ n'est valable que pour le terme, la phrase ou la sous-requête qu'il précède directement,
+            ainsi la requête
+            <programlisting language="querystring"><![CDATA[
+title:Do it right
+]]></programlisting>
+            ne trouvera que "Do" dans le champ 'title'. Elle trouvera "it" et "right" dans le champ par défaut (si le
+            champ par défaut est défini) ou dans tous les champs indexés (si le champ par défaut est défini à <constant>NULL</constant>).
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.wildcard">
+        <title>Jokers (Wildcards)</title>
+        <para>
+            Lucene supporte les recherches avec joker sur un ou plusieurs caractères au sein des termes simples (mais pas
+            dans les phrases).
+        </para>
+        <para>
+            Pour effectuez une recherche avec joker sur un seul caractère, utilisez le symbole "?".
+        </para>
+        <para>
+            Pour effectuez une recherche avec joker sur plusieurs caractères, utilisez le symbole "*".
+        </para>
+        <para>
+            La recherche avec un joker sur un seul caractère va faire correspondre le terme avec le "?" remplacé par n'importe quel autre caractère unique.
+            Par exemple, pour trouver "text" ou "test" vous pouvez utiliser la recherche :
+            <programlisting language="querystring"><![CDATA[
+te?t
+]]></programlisting>
+        </para>
+        <para>
+            Multiple character wildcard searches look for 0 or more characters when matching strings against terms. For example, to search for test,
+            tests or tester, you can use the search:
+            <programlisting language="querystring"><![CDATA[
+test*
+]]></programlisting>
+        </para>
+        <para>
+            You can use "?", "*" or both at any place of the term:
+            <programlisting language="querystring"><![CDATA[
+*wr?t*
+]]></programlisting>
+            It searches for "write", "wrote", "written", "rewrite", "rewrote" and so on.
+        </para>
+        <para>
+            Starting from ZF 1.7.7 wildcard patterns need some non-wildcard prefix. Default prefix length is 3 (like in Java Lucene).
+            So "*", "te?t", "*wr?t*" terms will cause an exception<footnote>
+            <para>Please note, that it's not a <code>Zend_Search_Lucene_Search_QueryParserException</code>, but a
+            <code>Zend_Search_Lucene_Exception</code>. It's thrown during query rewrite (execution) operation.</para></footnote>.
+        </para>
+        <para>
+            It can be altered using <code>Zend_Search_Lucene_Search_Query_Wildcard::getMinPrefixLength()</code> and
+            <code>Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength()</code> methods.
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.modifiers">
+        <title>Term Modifiers</title>
+        <para>
+            Lucene supports modifying query terms to provide a wide range of searching options.
+        </para>
+        <para>
+            "~" modifier can be used to specify proximity search for phrases or fuzzy search for individual terms.
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.range">
+        <title>Range Searches</title>
+        <para>
+            Range queries allow the developer or user to match documents whose field(s) values are between the lower and upper bound specified by the range query.
+            Range Queries can be inclusive or exclusive of the upper and lower bounds. Sorting is performed lexicographically.
+            <programlisting language="querystring"><![CDATA[
+mod_date:[20020101 TO 20030101]
+]]></programlisting>
+            This will find documents whose mod_date fields have values between 20020101 and 20030101, inclusive. Note that Range Queries are not
+            reserved for date fields. You could also use range queries with non-date fields:
+            <programlisting language="querystring"><![CDATA[
+title:{Aida TO Carmen}
+]]></programlisting>
+            This will find all documents whose titles would be sorted between Aida and Carmen, but not including Aida and Carmen.
+        </para>
+        <para>
+            Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets.
+        </para>
+        <para>
+            If field is not specified then <classname>Zend_Search_Lucene</classname> searches for specified interval through all fields by default.
+            <programlisting language="querystring"><![CDATA[
+{Aida TO Carmen}
+]]></programlisting>
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.fuzzy">
+        <title>Fuzzy Searches</title>
+        <para>
+            <classname>Zend_Search_Lucene</classname> as well as Java Lucene supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm.
+            To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar
+            in spelling to "roam" use the fuzzy search:
+            <programlisting language="querystring"><![CDATA[
+roam~
+]]></programlisting>
+            This search will find terms like foam and roams.
+            Additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms
+            with a higher similarity will be matched. For example:
+            <programlisting language="querystring"><![CDATA[
+roam~0.8
+]]></programlisting>
+            The default that is used if the parameter is not given is 0.5.
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.matched-terms-limitations">
+        <title>Matched terms limitation</title>
+        <para>
+            Wildcard, range and fuzzy search queries may match too many terms. It may cause incredible search performance downgrade.
+        </para>
+        <para>
+            So Zend_Search_Lucene sets a limit of matching terms per query (subquery). This limit can be retrieved and set using
+            <code>Zend_Search_Lucene::getTermsPerQueryLimit()</code>/<code>Zend_Search_Lucene::setTermsPerQueryLimit($limit)</code>
+            methods.
+        </para>
+        <para>
+            Default matched terms per query limit is 1024.
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.proximity-search">
+        <title>Proximity Searches</title>
+        <para>
+            Lucene supports finding words from a phrase that are within a specified word distance in a string. To do a proximity search
+            use the tilde, "~", symbol at the end of the phrase. For example to search for a "Zend" and
+            "Framework" within 10 words of each other in a document use the search:
+            <programlisting language="querystring"><![CDATA[
+"Zend Framework"~10
+]]></programlisting>
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.boosting">
+        <title>Boosting a Term</title>
+        <para>
+            Java Lucene and <classname>Zend_Search_Lucene</classname> provide the relevance level of matching documents based
+            on the terms found. To boost the relevance of a term use the caret, "^", symbol with a boost factor (a number)
+            at the end of the term you are searching. The higher the boost factor, the more relevant
+            the term will be.
+        </para>
+        <para>
+            Boosting allows you to control the relevance of a document by boosting individual terms. For example,
+            if you are searching for
+            <programlisting language="querystring"><![CDATA[
+PHP framework
+]]></programlisting>
+            and you want the term "PHP" to be more relevant boost it using the ^ symbol along with the
+            boost factor next to the term. You would type:
+            <programlisting language="querystring"><![CDATA[
+PHP^4 framework
+]]></programlisting>
+            This will make documents with the term PHP appear more relevant. You can also boost phrase
+            terms and subqueries as in the example:
+            <programlisting language="querystring"><![CDATA[
+"PHP framework"^4 "Zend Framework"
+]]></programlisting>
+            By default, the boost factor is 1. Although the boost factor must be positive,
+            it may be less than 1 (e.g. 0.2).
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.boolean">
+        <title>Boolean Operators</title>
+        <para>
+            Boolean operators allow terms to be combined through logic operators.
+            Lucene supports AND, "+", OR, NOT and "-" as Boolean operators.
+            Java Lucene requires boolean operators to be ALL CAPS. <classname>Zend_Search_Lucene</classname> does not.
+        </para>
+        <para>
+            AND, OR, and NOT operators and "+", "-" defines two different styles to construct boolean queries.
+            Unlike Java Lucene, <classname>Zend_Search_Lucene</classname> doesn't allow these two styles to be mixed.
+        </para>
+        <para>
+            If the AND/OR/NOT style is used, then an AND or OR operator must be present between all query terms.
+            Each term may also be preceded by NOT operator. The AND operator has higher precedence than the OR operator.
+            This differs from Java Lucene behavior.
+        </para>
+        <sect3 id="zend.search.lucene.query-language.boolean.and">
+            <title>AND</title>
+            <para>
+                The AND operator means that all terms in the "AND group" must match some part of the searched field(s).
+            </para>
+            <para>
+                To search for documents that contain "PHP framework" and "Zend Framework" use the query:
+                <programlisting language="querystring"><![CDATA[
+"PHP framework" AND "Zend Framework"
+]]></programlisting>
+            </para>
+        </sect3>
+        <sect3 id="zend.search.lucene.query-language.boolean.or">
+            <title>OR</title>
+            <para>
+                The OR operator divides the query into several optional terms.
+            </para>
+            <para>
+                To search for documents that contain "PHP framework" or "Zend Framework" use the query:
+                <programlisting language="querystring"><![CDATA[
+"PHP framework" OR "Zend Framework"
+]]></programlisting>
+            </para>
+        </sect3>
+        <sect3 id="zend.search.lucene.query-language.boolean.not">
+            <title>NOT</title>
+            <para>
+                The NOT operator excludes documents that contain the term after NOT. But an "AND group" which contains
+                only terms with the NOT operator gives an empty result set instead of a full set of indexed documents.
+            </para>
+            <para>
+                To search for documents that contain "PHP framework" but not "Zend Framework" use the query:
+                <programlisting language="querystring"><![CDATA[
+"PHP framework" AND NOT "Zend Framework"
+]]></programlisting>
+            </para>
+        </sect3>
+        <sect3 id="zend.search.lucene.query-language.boolean.other-form">
+            <title>&amp;&amp;, ||, and ! operators</title>
+            <para>
+                &amp;&amp;, ||, and ! may be used instead of AND, OR, and NOT notation.
+            </para>
+        </sect3>
+        <sect3 id="zend.search.lucene.query-language.boolean.plus">
+            <title>+</title>
+            <para>
+                The "+" or required operator stipulates that the term after the "+" symbol must match the document.
+            </para>
+            <para>
+                To search for documents that must contain "Zend" and may contain "Framework" use the query:
+                <programlisting language="querystring"><![CDATA[
++Zend Framework
+]]></programlisting>
+            </para>
+        </sect3>
+        <sect3 id="zend.search.lucene.query-language.boolean.minus">
+            <title>-</title>
+            <para>
+                The "-" or prohibit operator excludes documents that match the term after the "-" symbol.
+            </para>
+            <para>
+                To search for documents that contain "PHP framework" but not "Zend Framework" use the query:
+                <programlisting language="querystring"><![CDATA[
+"PHP framework" -"Zend Framework"
+]]></programlisting>
+            </para>
+        </sect3>
+        <sect3 id="zend.search.lucene.query-language.boolean.no-operator">
+            <title>No Operator</title>
+            <para>
+                If no operator is used, then the search behavior is defined by the "default boolean operator".
+            </para>
+            <para>
+                This is set to <code>OR</code> by default.
+            </para>
+            <para>
+                That implies each term is optional by default. It may or may not be present within document, but documents with this term
+                will receive a higher score.
+            </para>
+            <para>
+                To search for documents that requires "PHP framework" and may contain "Zend Framework" use the query:
+                <programlisting language="querystring"><![CDATA[
++"PHP framework" "Zend Framework"
+]]></programlisting>
+            </para>
+            <para>
+                The default boolean operator may be set or retrieved with the
+                <classname>Zend_Search_Lucene_Search_QueryParser::setDefaultOperator($operator)</classname> and
+                <classname>Zend_Search_Lucene_Search_QueryParser::getDefaultOperator()</classname> methods, respectively.
+            </para>
+            <para>
+                These methods operate with the
+                <classname>Zend_Search_Lucene_Search_QueryParser::B_AND</classname> and
+                <classname>Zend_Search_Lucene_Search_QueryParser::B_OR</classname> constants.
+            </para>
+        </sect3>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.grouping">
+        <title>Grouping</title>
+        <para>
+            Java Lucene and <classname>Zend_Search_Lucene</classname> support using parentheses to group clauses to form sub queries. This can be
+            useful if you want to control the precedence of boolean logic operators for a query or mix different boolean query styles:
+            <programlisting language="querystring"><![CDATA[
++(framework OR library) +php
+]]></programlisting>
+            <classname>Zend_Search_Lucene</classname> supports subqueries nested to any level.
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.field-grouping">
+        <title>Field Grouping</title>
+        <para>
+            Lucene also supports using parentheses to group multiple clauses to a single field.
+        </para>
+        <para>
+            To search for a title that contains both the word "return" and the phrase "pink panther" use the query:
+            <programlisting language="querystring"><![CDATA[
+title:(+return +"pink panther")
+]]></programlisting>
+        </para>
+    </sect2>
+    <sect2 id="zend.search.lucene.query-language.escaping">
+        <title>Escaping Special Characters</title>
+        <para>
+            Lucene supports escaping special characters that are used in query syntax. The current list of special
+            characters is:
+        </para>
+        <para>
+            + - &amp;&amp; || ! ( ) { } [ ] ^ " ~ * ? : \
+        </para>
+        <para>
+            + and - inside single terms are automatically treated as common characters.
+        </para>
+        <para>
+            For other instances of these characters use the \ before each special character you'd like to escape. For example to search for (1+1):2 use the query:
+            <programlisting language="querystring"><![CDATA[
+\(1\+1\)\:2
+]]></programlisting>
+        </para>
+    </sect2>
+</sect1>