问

使用ElasticSearch中不同字段的不同查询匹配和突出显示文档

伊人憔悴儡发布于 2023-02-09 07:38

string

我的目标是创建一个查询,它将使用文档query_string字段A,B和C上的常规查询找到"最佳"20 个文档,并尝试在字段D上进行精确或精确子集匹配.例如:if字段D是'AAA.BBB.CCC.DDD',然后查询"AAA.BBB"应匹配(和"BBB.CCC",以及"AAA.BBB.CCC"等).哦,是的,我也希望得到突出的结果.

我最接近的尝试是在字段D上使用ngram标记器/分析器,并且只允许A,B,C正常索引.

{
    "settings": {
        "number_of_shards": 5,
        "index": {
            "analysis": {
                "tokenizer": {
                    "customNgram": {
                        "type": "nGram",
                        "min_gram": "3",
                        "max_gram": "5"
                    }
                },
                "analyzer": {
                    "lllNgram": {
                        "type": "custom",
                        "filter": "lowercase",
                        "tokenizer": "customNgram"
                    }
                }
            }
        }
    },
    "mappings": {
        "lessons": {
            "_id": {
                "path": "id"
            },
            "properties": {
                "id": {
                    "type": "integer"
                },
                "A": {
                    "type": "string",
                    "store": "yes"
                },
                "B": {
                    "type": "string",
                    "store": "yes"
                },
                "C": {
                    "type": "string",
                    "store": "yes"
                },
                "D": {
                    "type": "string",
                    "store": "yes",
                    "analyzer": "lllNgram"
                }
            }
        }
    }
}

然后使用如下查询:

{
    "size":20,
    "query":{
        "filtered":{
            "query":{
                "match_all":{}
            },
            "filter":{
                "or":[
                    {
                        "query":{
                            "query_string":{
                                "query":"XYZZY TOP",
                                "fields":["A","B","C"]
                            }
                        }
                    },
                    {
                        "query":{
                            "match":{
                                "D": {
                                    "query":"XYZZY TOP",
                                    "operator" : "and"
                                }
                            }
                        }
                    }
                ]
            }
         }
    },
    "highlight":{
        "pre_tags":[""],
        "post_tags":["<\/em>"],
        "fields":{
            "A":{},
            "B":{},
            "C":{},
            "D":{}
        }
    }
}

这个问题是,D场似乎永远不会匹配任何东西......无论如何...... 结果集也不包含此查询的任何突出显示.

所以,请帮助我理解我在查询中做错了什么.

1 个回答

您的映射/查询中存在几个问题:

错误的ngram大小:您定义ngram(3, 5),因此生成的术语的最大长度仅为5,并且您查询AAA.BBB(length = 7).它可以在您的映射中匹配,但它无效并且在这种情况下它是错误的设计(在索引和搜索中使用它时出错),您可以将其扩展为ngram(3, 20)仅用于索引时间.

映射无效:您不需要为索引/搜索定义ngram.相反,您可以定义index_analyzer = lllNgram,然后使用不修改search_analyzer数据的分析器,例如search_analyzer = keyword_lowercase_analyzer在我的示例中.index_analyzer索引数据时使用,所以我们需要定义规则来生成所有可能匹配的术语(本例中search_analyzer为ngram),在与索引数据进行比较之前解析查询时使用,所以我们只需要定义规则以保持原始状态.这种情况(只是小写)

Inconsequence查询:为什么必须使用筛选查询？它会省略ES分数而你无法得到the "best" 20 documents结果.

这是一个可行的映射/查询:

{ "settings": { "number_of_shards": 5, "index": { "analysis": { "tokenizer": { "customNgram": { "type": "nGram", "min_gram": "3", "max_gram": "20" } }, "analyzer": { "lllNgram": { "type": "custom", "filter": "lowercase", "tokenizer": "customNgram" }, "keyword_lowercase_analyzer": { "tokenizer": "keyword", "filter": ["lowercase"] } } } } }, "mappings": { "lessons": { "_id": { "path": "id" }, "properties": { "id": { "type": "integer" }, "A": { "type": "string", "store": "yes" }, "B": { "type": "string", "store": "yes" }, "C": { "type": "string", "store": "yes" }, "D": { "type": "string", "store": "yes", "index" : "analyzed", "index_analyzer" : "lllNgram", "search_analyzer" : "keyword_lowercase_analyzer", "term_vector" : "with_positions_offsets" } } } } }

查询:

{ "size": 20, "query": { "bool": { "should": [ { "query_string": { "query": "AAA.BBB", "fields": [ "A", "B", "C" ] } }, { "match": { "D": { "query": "AAA.BBB", "operator": "or" } } } ] } }, "highlight": { "pre_tags": [ "<em>" ], "post_tags": [ "</em>" ], "fields": { "A": {}, "B": {}, "C": {}, "D": {} } } }

注意:

我用来with_positions_offsets更快地突出显示术语.Yon可以在这里参考更多信息:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html

您可以安装inquisitor插件来测试分析器,它可以帮助您找出这样的问题.

2023-02-09 07:43 回答

叶治样

撰写答案

回答问题...

今天，你开发时遇到什么问题呢？

立即提问

热门标签