Use Kibana is very convinient to Discover
and Visualize
data, but sometimes we need to grab all data to process by machine. Here is a quick example for simple case:
import logging
from elasticsearch import Elasticsearch
def main():
logging.basicConfig(level=logging.INFO)
es = Elasticsearch(hosts="real-host",
http_auth=("es-user-name", "es-password"), # if have
timeout=120)
size = 10000
data = {
"size": size,
"query": {
"match": {
"key1": "value1"
}
}
}
res = es.search(index="index-name", body=data, scroll="5m")
log.info("total=%s", res["hits"]["total"])
# for i in range(total/size):
# res = es.scroll(scroll_id=res["_scroll_id"], scroll="5m")
if __name__ == "__main__":
main()
if you don’t want all fields, you can add filter_path
parameter for es.search
and es.scroll
, but make sure the value is specified like:
fields = [
"_scroll_id",
"hits.total",
"hits.hits._source.@timestamp",
"hits.hits._source.key1",
"hits.hits._source.key2",
"hits.hits._source.keyn",
]
if result total is larger than size, you should use es.scroll
to do pagination.
references:
- https://www.cnblogs.com/Serverlessops/articles/11747962.html
License: (CC 4.0) BY-NC-SA