Elasticsearch

Elasticsearch是一个分布式,REST风格的搜索和分析引擎,能够解决越来越多的用例。(Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. )

Github stars Tracking Chart

Elasticsearch是一个分布式,REST风格的搜索和分析引擎,能够解决越来越多的用例。 作为 Elastic Stack(弹性堆栈)的核心,它集中存储您的数据,以便您可以发现预期和发现意外。

Elasticsearch是一个高度可扩展的开源全文搜索和分析引擎。它允许您快速,实时地存储,搜索和分析大量数据。它通常用作为具有复杂的搜索功能和要求的应用程序提供的底层引擎/技术。
以下是Elasticsearch可用于以下几个示例用例:
  • 您运行一个在线网上商店,让您的客户可以搜索您销售的产品。在这种情况下,您可以使用Elasticsearch存储整个产品目录和库存,并为其提供搜索和自动填充建议。
  • 您要收集日志或交易数据,并且您想要分析和挖掘这些数据以查找趋势,统计信息,摘要或异常。在这种情况下,您可以使用Logstash(Elasticsearch /Logstash /Kibana堆栈的一部分)来收集,聚合和解析数据,然后由Logstash将此数据提供给Elasticsearch。一旦数据在Elasticsearch中,您可以运行搜索和聚合来挖掘您感兴趣的任何信息。
  • 您运行一个价格提醒平台,让精明的客户指定一个价格规则,如“我有兴趣购买一个特定的电子小工具,如果小工具的价格在下个月内任何供应商的价格低于$ X,我想收到通知” 。在这种情况下,您可以剔除供应商的价格,将其推向Elasticsearch,并使用其反向搜索(Percolator)功能,将价格变动与客户查询进行匹配,并在匹配结果发现后最终将其提示给客户。
  • 您有分析/商业智能需求,并希望快速调查、分析、可视化和询问大量数据的特别问题(考虑数百万或数十亿条记录)。在这种情况下,您可以使用Elasticsearch存储数据,然后使用Kibana(Elasticsearch /Logstash /Kibana堆栈的一部分)构建自定义仪表板,以便可视化对您重要的数据。此外,您可以使用Elasticsearch聚合功能根据数据执行复杂的商业智能查询。
Elasticsearch功能包括:
  • 分布式和高度可用的搜索引擎。
    • 每个索引都使用可分配数量的分片完全分片。
    • 每个分片都可以有一个或多个副本。
    • 对任何复制分片进行读/搜索操作。
  • 多租户多种类型
    • 支持多个索引。
    • 每个索引支持多个类型。
    • 索引级配置(分片数,索引存储,...)。
  • 各种API
    • HTTP RESTful API
    • Native Java API。
    • 所有API都执行自动节点操作重新路由。
  • 面向文件
    • 不需要前期模式定义。
    • 可以为每个类型定义模式以定制索引过程。
  • 可靠的,异步写入长期持续性。
  • (近)实时搜索。
  • 建在Lucene之上
    • 每个分片都是一个功能齐全的Lucene索引
    • Lucene的所有力量很容易通过简单的配置/插件发挥出来。
  • 每个操作一致性
    • 单个文档级操作是原子性、一致性、隔离性和持久性。
  • Apache许可证下的开放源码版本2(“ALv2”)

Overview

Name With Ownerelastic/elasticsearch
Primary LanguageJava
Program languageShell (Language Count: 19)
PlatformLinux, Mac, Windows
License:Other
Release Count392
Last Release Namev7.17.20 (Posted on 2024-04-10 13:55:40)
First Release Namev0.4.0 (Posted on 2010-02-08 15:32:54)
Created At2010-02-08 13:20:56
Pushed At2024-04-21 14:31:12
Last Commit At
Stargazers Count67.5k
Watchers Count2.7k
Fork Count24.1k
Commits Count76.6k
Has Issues Enabled
Issues Count34804
Issue Open Count4063
Pull Requests Count62979
Pull Requests Open Count678
Pull Requests Close Count8913
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

= Elasticsearch

== A Distributed RESTful Search Engine

=== https://www.elastic.co/products/elasticsearch[https://www.elastic.co/products/elasticsearch]

Elasticsearch is a distributed RESTful search engine built for the cloud. Features include:

  • Distributed and Highly Available Search Engine.
    ** Each index is fully sharded with a configurable number of shards.
    ** Each shard can have one or more replicas.
    ** Read / Search operations performed on any of the replica shards.
  • Multi Tenant.
    ** Support for more than one index.
    ** Index level configuration (number of shards, index storage, ...).
  • Various set of APIs
    ** HTTP RESTful API
    ** All APIs perform automatic node operation rerouting.
  • Document oriented
    ** No need for upfront schema definition.
    ** Schema can be defined for customization of the indexing process.
  • Reliable, Asynchronous Write Behind for long term persistency.
  • (Near) Real Time Search.
  • Built on top of Apache Lucene
    ** Each shard is a fully functional Lucene index
    ** All the power of Lucene easily exposed through simple configuration / plugins.
  • Per operation consistency
    ** Single document level operations are atomic, consistent, isolated and durable.

== Getting Started

First of all, DON'T PANIC. It will take 5 minutes to get the gist of what Elasticsearch is all about.

=== Installation

=== Indexing

Let's try and index some twitter like information. First, let's index some tweets (the twitter index will be created automatically):


curl -XPUT 'http://localhost:9200/twitter/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T13:12:00",
"message": "Trying out Elasticsearch, so far so good?"
}'

curl -XPUT 'http://localhost:9200/twitter/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "Another tweet, will it be indexed?"
}'

curl -XPUT 'http://localhost:9200/twitter/_doc/3?pretty' -H 'Content-Type: application/json' -d '
{
"user": "elastic",
"post_date": "2010-01-15T01:46:38",
"message": "Building the site, should be kewl"
}'

Now, let's see if the information was added by GETting it:


curl -XGET 'http://localhost:9200/twitter/_doc/1?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/2?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/3?pretty=true'

=== Searching

Mmm search..., shouldn't it be elastic?
Let's find all the tweets that kimchy posted:


curl -XGET 'http://localhost:9200/twitter/_search?q=user:kimchy&pretty=true'

We can also use the JSON query language Elasticsearch provides instead of a query string:


curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match" : { "user": "kimchy" }
}
}'

Just for kicks, let's get all the documents stored (we should see the tweet from elastic as well):


curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match_all" : {}
}
}'

We can also do range search (the post_date was automatically identified as date)


curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"range" : {
"post_date" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:00:00" }
}
}
}'

There are many more options to perform search, after all, it's a search product no? All the familiar Lucene queries are available through the JSON query language, or through the query parser.

=== Multi Tenant and Indices

Man, that twitter index might get big (in this case, index size == valuation). Let's see if we can structure our twitter system a bit differently in order to support such large amounts of data.

Elasticsearch supports multiple indices. In the previous example we used an index called twitter that stored tweets for every user.

Another way to define our simple twitter system is to have a different index per user (note, though that each index has an overhead). Here is the indexing curl's in this case:


curl -XPUT 'http://localhost:9200/kimchy/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T13:12:00",
"message": "Trying out Elasticsearch, so far so good?"
}'

curl -XPUT 'http://localhost:9200/kimchy/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "Another tweet, will it be indexed?"
}'

The above will index information into the kimchy index. Each user will get their own special index.

Complete control on the index level is allowed. As an example, in the above case, we might want to change from the default 1 shard with 1 replica per index, to 2 shards with 1 replica per index (because this user tweets a lot). Here is how this can be done (the configuration can be in yaml as well):


curl -XPUT http://localhost:9200/another_user?pretty -H 'Content-Type: application/json' -d '
{
"settings" : {
"index.number_of_shards" : 2,
"index.number_of_replicas" : 1
}
}'

Search (and similar operations) are multi index aware. This means that we can easily search on more than one
index (twitter user), for example:


curl -XGET 'http://localhost:9200/kimchy,another_user/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match_all" : {}
}
}'

Or on all the indices:


curl -XGET 'http://localhost:9200/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match_all" : {}
}
}'

And the cool part about that? You can easily search on multiple twitter users (indices), with different boost levels per user (index), making social search so much simpler (results from my friends rank higher than results from friends of my friends).

=== Distributed, Highly Available

Let's face it, things will fail....

Elasticsearch is a highly available and distributed search engine. Each index is broken down into shards, and each shard can have one or more replicas. By default, an index is created with 1 shard and 1 replica per shard (1/1). There are many topologies that can be used, including 1/10 (improve search performance), or 20/1 (improve indexing performance, with search executed in a map reduce fashion across shards).

In order to play with the distributed nature of Elasticsearch, simply bring more nodes up and shut down nodes. The system will continue to serve requests (make sure you use the correct http port) with the latest data indexed.

=== Where to go from here?

We have just covered a very small portion of what Elasticsearch is all about. For more information, please refer to the http://www.elastic.co/products/elasticsearch[elastic.co] website. General questions can be asked on the https://discuss.elastic.co[Elastic Forum] or https://ela.st/slack[on Slack]. The Elasticsearch GitHub repository is reserved for bug reports and feature requests only.

=== Building from Source

Elasticsearch uses https://gradle.org[Gradle] for its build system.

In order to create a distribution, simply run the ./gradlew assemble command in the cloned directory.

The distribution for each project will be created under the build/distributions directory in that project.

See the xref:TESTING.asciidoc[TESTING] for more information about running the Elasticsearch test suite.

=== Upgrading from older Elasticsearch versions

In order to ensure a smooth upgrade process from earlier versions of Elasticsearch, please see our https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html[upgrade documentation] for more details on the upgrade process.

To the top