Elasticsearch-py：Python 客户端与 Elasticsearch 交互

好的，各位观众老爷们，各位技术宅男、代码女神们，欢迎来到今天的“Elasticsearch-py：Python 客户端与 Elasticsearch 交互”特别节目！我是你们的老朋友，人称“代码界段子手”的程序猿小李。今天，咱们不搞那些枯燥乏味的理论，咱们用最通俗易懂的方式，聊聊如何在 Python 世界里，优雅地调戏 Elasticsearch 这头“搜索神兽”。

开场白：Elasticsearch，你这个磨人的小妖精！

Elasticsearch，江湖人称“ES”，是一个基于 Lucene 的分布式搜索和分析引擎。简单来说，它就像一个超级强大的数据库，但比传统数据库更擅长搜索和分析。想象一下，你手头有海量的数据，想从中快速找到你想要的信息，ES 就能帮你搞定！

但是，光有 ES 这头神兽还不够，你还得有一根趁手的鞭子，才能指挥它为你所用。而 Elasticsearch-py，就是这根鞭子，它是 Elasticsearch 官方提供的 Python 客户端，让你可以用 Python 代码轻松地与 ES 互动，实现各种骚操作。

第一幕：安装与连接，和 ES 建立“亲密关系”

要开始使用 Elasticsearch-py，首先得把它请到你的 Python 环境里来。这很简单，就像追女神一样，先得拿到她的联系方式嘛！

pip install elasticsearch

一行命令，搞定！接下来，我们需要建立与 ES 的连接，就像和女神打招呼一样，得先知道她在哪里住才行。

from elasticsearch import Elasticsearch

# 连接到本地 Elasticsearch 实例，默认端口 9200
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

# 检查连接是否成功，如果返回 True，说明连接没问题
if es.ping():
    print("恭喜你，成功连接到 Elasticsearch!")
else:
    print("连接失败，请检查 Elasticsearch 是否启动。")

这段代码就像一个简单的“你好”程序，告诉 ES：“嘿，Python 在这里，想和你玩耍！”如果 ES 回应了，那就说明你已经成功建立了连接，可以开始下一步操作了。

第二幕：索引操作，给 ES 喂“数据饲料”

有了连接，接下来就要给 ES 喂“数据饲料”了。在 ES 里，数据是按照“索引”来组织的，可以把索引想象成数据库里的表。

2.1 创建索引：打造你的数据“粮仓”

index_name = "my_index"

# 定义索引的 mapping，就像定义数据库表的结构一样
mapping = {
    "mappings": {
        "properties": {
            "title": {"type": "text"},
            "content": {"type": "text"},
            "author": {"type": "keyword"},
            "publish_date": {"type": "date", "format": "yyyy-MM-dd"}
        }
    }
}

# 创建索引，如果索引已经存在，会报错
try:
    es.indices.create(index=index_name, body=mapping)
    print(f"索引 '{index_name}' 创建成功！")
except Exception as e:
    print(f"创建索引失败: {e}")

这段代码定义了一个名为 my_index 的索引，并指定了它的 mapping。Mapping 就像数据库表的结构，定义了每个字段的类型。例如，title 和 content 字段是 text 类型，适合存储文本内容；author 字段是 keyword 类型，适合存储精确匹配的关键词；publish_date 字段是 date 类型，适合存储日期。

2.2 插入数据：往“粮仓”里装粮食

有了索引，就可以往里面插入数据了。在 ES 里，每条数据被称为一个“文档”，可以把它想象成数据库表里的一行数据。

document = {
    "title": "Elasticsearch 入门教程",
    "content": "这是一篇 Elasticsearch 的入门教程，介绍了如何使用 Elasticsearch-py 客户端。",
    "author": "程序猿小李",
    "publish_date": "2023-10-27"
}

# 插入文档，需要指定索引名称和文档 ID
try:
    response = es.index(index=index_name, id=1, document=document)
    print(f"文档插入成功！文档 ID: {response['_id']}")
except Exception as e:
    print(f"文档插入失败: {e}")

这段代码插入了一条文档到 my_index 索引中，并指定了文档 ID 为 1。如果你不指定文档 ID，ES 会自动生成一个唯一的 ID。

2.3 批量插入数据：提高“装粮”效率

如果需要插入大量数据，一条一条插入效率太低了。ES 提供了批量插入的接口，可以大大提高效率。

from elasticsearch import helpers

documents = [
    {
        "_index": index_name,
        "_id": 2,
        "_source": {
            "title": "Python 编程技巧",
            "content": "分享一些 Python 编程的实用技巧。",
            "author": "代码女神",
            "publish_date": "2023-10-26"
        }
    },
    {
        "_index": index_name,
        "_id": 3,
        "_source": {
            "title": "大数据分析实战",
            "content": "手把手教你进行大数据分析。",
            "author": "数据狂人",
            "publish_date": "2023-10-25"
        }
    }
]

# 批量插入文档
try:
    response = helpers.bulk(es, documents)
    print(f"批量插入文档成功！成功插入 {response[0]} 条文档。")
except Exception as e:
    print(f"批量插入文档失败: {e}")

这段代码使用了 elasticsearch.helpers.bulk 方法来批量插入文档。需要注意的是，每个文档都需要包含 _index 和 _source 字段，分别指定索引名称和文档内容。

第三幕：查询操作，从“粮仓”里找粮食

有了数据，接下来就要从 ES 里查询数据了。ES 提供了强大的查询功能，可以根据各种条件查询数据。

3.1 简单查询：按 ID 查找

# 根据文档 ID 查询文档
try:
    response = es.get(index=index_name, id=1)
    print(f"查询结果: {response['_source']}")
except Exception as e:
    print(f"查询失败: {e}")

这段代码根据文档 ID 1 查询文档，并打印文档内容。

3.2 匹配查询：根据关键词查找

# 匹配查询，根据关键词查找文档
query = {
    "query": {
        "match": {
            "content": "Elasticsearch"
        }
    }
}

# 执行查询
try:
    response = es.search(index=index_name, body=query)
    print(f"查询结果: 共找到 {response['hits']['total']['value']} 条文档。")
    for hit in response['hits']['hits']:
        print(f"文档 ID: {hit['_id']}, 文档内容: {hit['_source']['title']}")
except Exception as e:
    print(f"查询失败: {e}")

这段代码使用了 match 查询，根据关键词 "Elasticsearch" 查找文档。match 查询会对关键词进行分词，然后查找包含这些词的文档。

3.3 布尔查询：组合多个条件

# 布尔查询，组合多个条件
query = {
    "query": {
        "bool": {
            "must": [
                {"match": {"content": "Python"}},
                {"match": {"content": "技巧"}}
            ],
            "must_not": [
                {"match": {"author": "程序猿小李"}}
            ]
        }
    }
}

# 执行查询
try:
    response = es.search(index=index_name, body=query)
    print(f"查询结果: 共找到 {response['hits']['total']['value']} 条文档。")
    for hit in response['hits']['hits']:
        print(f"文档 ID: {hit['_id']}, 文档内容: {hit['_source']['title']}")
except Exception as e:
    print(f"查询失败: {e}")

这段代码使用了 bool 查询，组合了多个条件。must 子句表示必须满足的条件，must_not 子句表示必须不满足的条件。这段代码的意思是：查找内容包含 "Python" 和 "技巧"，且作者不是 "程序猿小李" 的文档。

3.4 范围查询：查找特定范围内的文档

# 范围查询，查找特定范围内的文档
query = {
    "query": {
        "range": {
            "publish_date": {
                "gte": "2023-10-26",
                "lte": "2023-10-27"
            }
        }
    }
}

# 执行查询
try:
    response = es.search(index=index_name, body=query)
    print(f"查询结果: 共找到 {response['hits']['total']['value']} 条文档。")
    for hit in response['hits']['hits']:
        print(f"文档 ID: {hit['_id']}, 文档内容: {hit['_source']['title']}")
except Exception as e:
    print(f"查询失败: {e}")

这段代码使用了 range 查询，查找 publish_date 在 "2023-10-26" 和 "2023-10-27" 之间的文档。

第四幕：更新与删除，维护你的“粮仓”

数据总是在变化的，需要定期更新和删除。

4.1 更新文档：修改“粮食”

# 更新文档
try:
    response = es.update(index=index_name, id=1, body={"doc": {"title": "Elasticsearch 高级教程"}})
    print(f"文档更新成功！")
except Exception as e:
    print(f"文档更新失败: {e}")

这段代码更新了文档 ID 为 1 的文档的 title 字段。

4.2 删除文档：清理“变质的粮食”

# 删除文档
try:
    response = es.delete(index=index_name, id=1)
    print(f"文档删除成功！")
except Exception as e:
    print(f"文档删除失败: {e}")

这段代码删除了文档 ID 为 1 的文档。

4.3 删除索引：清空“粮仓”

# 删除索引
try:
    response = es.indices.delete(index=index_name)
    print(f"索引 '{index_name}' 删除成功！")
except Exception as e:
    print(f"索引删除失败: {e}")

这段代码删除了 my_index 索引。

第五幕：高级技巧，让你的 ES 飞起来

除了基本的操作，Elasticsearch-py 还提供了一些高级技巧，可以让你更好地利用 ES 的强大功能。

5.1 聚合分析：从数据中挖掘价值

聚合分析是 ES 的一个重要功能，可以对数据进行分组、统计、计算等操作，从而挖掘数据背后的价值。

# 聚合分析，统计每个作者的文章数量
query = {
    "size": 0,  # 不需要返回文档，只需要返回聚合结果
    "aggs": {
        "authors": {
            "terms": {
                "field": "author"
            }
        }
    }
}

# 执行查询
try:
    response = es.search(index=index_name, body=query)
    for bucket in response['aggregations']['authors']['buckets']:
        print(f"作者: {bucket['key']}, 文章数量: {bucket['doc_count']}")
except Exception as e:
    print(f"聚合分析失败: {e}")

这段代码使用了 terms 聚合，统计了每个作者的文章数量。

5.2 滚动查询：处理海量数据

如果需要查询大量数据，一次性查询可能会导致内存溢出。ES 提供了滚动查询的功能，可以分批次地查询数据。

# 滚动查询，处理海量数据
try:
    # 初始化滚动查询
    response = es.search(index=index_name, scroll='1m', size=1000, query={"match_all": {}})
    scroll_id = response['_scroll_id']

    # 循环查询
    while True:
        # 获取下一批数据
        response = es.scroll(scroll_id=scroll_id, scroll='1m')
        hits = response['hits']['hits']

        # 如果没有数据了，就退出循环
        if not hits:
            break

        # 处理数据
        for hit in hits:
            print(f"文档 ID: {hit['_id']}, 文档内容: {hit['_source']['title']}")

    # 清理滚动查询
    es.clear_scroll(scroll_id=scroll_id)
    print("滚动查询完成！")

except Exception as e:
    print(f"滚动查询失败: {e}")

这段代码使用了滚动查询，每次查询 1000 条数据，直到查询完所有数据。

结语：Elasticsearch-py，你的 Python 搜索利器！

Elasticsearch-py 是一个功能强大的 Python 客户端，可以让你轻松地与 Elasticsearch 互动，实现各种搜索和分析需求。掌握了 Elasticsearch-py，你就拥有了一把打开数据宝库的钥匙，可以从中挖掘出无限的价值。

希望今天的讲解对你有所帮助。记住，代码的世界是充满乐趣的，只要你敢于探索，勇于尝试，就能创造出无限的可能！

感谢大家的收看，我们下期再见！

总结表格:

功能	方法/参数	描述	示例
连接	`Elasticsearch([{'host': 'localhost', 'port': 9200}])`	连接到 Elasticsearch 实例。	`es = Elasticsearch([{'host': 'localhost', 'port': 9200}])`
创建索引	`es.indices.create(index=index_name, body=mapping)`	创建索引，并定义 mapping。	`es.indices.create(index="my_index", body={"mappings": {"properties": {"title": {"type": "text"}}}})`
插入文档	`es.index(index=index_name, id=1, document=document)`	插入文档，指定索引名称、文档 ID 和文档内容。	`es.index(index="my_index", id=1, document={"title": "Hello Elasticsearch"})`
批量插入	`helpers.bulk(es, documents)`	批量插入文档，提高效率。	`python documents = [{ "_index": "my_index", "_id": 2, "_source": {"title": "Hello Python"}}, { "_index": "my_index", "_id": 3, "_source": {"title": "Hello World"}}] helpers.bulk(es, documents)`
查询文档	`es.get(index=index_name, id=1)`	根据文档 ID 查询文档。	`es.get(index="my_index", id=1)`
匹配查询	`es.search(index=index_name, body={"query": {"match": {"content": "Elasticsearch"}}})`	根据关键词查找文档。	`es.search(index="my_index", body={"query": {"match": {"content": "Elasticsearch"}}})`
布尔查询	`es.search(index=index_name, body={"query": {"bool": {"must": [...], "must_not": [...]}}})`	组合多个条件进行查询。	`es.search(index="my_index", body={"query": {"bool": {"must": [{"match": {"content": "Python"}}, {"match": {"content": "技巧"}}], "must_not": [{"match": {"author": "程序猿小李"}}]}}})`
范围查询	`es.search(index=index_name, body={"query": {"range": {"publish_date": {"gte": "2023-10-26", "lte": "2023-10-27"}}}})`	查找特定范围内的文档。	`es.search(index="my_index", body={"query": {"range": {"publish_date": {"gte": "2023-10-26", "lte": "2023-10-27"}}}})`
更新文档	`es.update(index=index_name, id=1, body={"doc": {"title": "Elasticsearch 高级教程"}})`	更新文档，修改指定字段的值。	`es.update(index="my_index", id=1, body={"doc": {"title": "Elasticsearch 高级教程"}})`
删除文档	`es.delete(index=index_name, id=1)`	删除文档。	`es.delete(index="my_index", id=1)`
删除索引	`es.indices.delete(index=index_name)`	删除索引。	`es.indices.delete(index="my_index")`
聚合分析	`es.search(index=index_name, body={"size": 0, "aggs": {...}})`	对数据进行分组、统计、计算等操作。	`es.search(index="my_index", body={"size": 0, "aggs": {"authors": {"terms": {"field": "author"}}}})`
滚动查询	`es.search(index=index_name, scroll='1m', size=1000, query={"match_all": {}})` / `es.scroll(scroll_id=scroll_id, scroll='1m')` / `es.clear_scroll(scroll_id=scroll_id)`	分批次地查询大量数据，避免内存溢出。	(见上方代码示例)

希望这份总结表格能帮助你更好地理解 Elasticsearch-py 的常用功能和方法。祝你使用愉快！🚀

发表回复 取消回复

发表回复取消回复