Elasticsearch

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine

1. Basic Concenpts

Relational Database Elasticsearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping

2. Index & Query

Get all indices

/_stats

Search API 1

Search All

/bank/_search?q=*

hits.hits – actual array of search results (defaults to first 10 documents)

Query Language

elasticsearch provides a full Query DSL based on JSON to define queries.

curl -XPOST /bank/_search

// match all, limit 10 offset 10
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}

// select fields
{
  "query": { "match_all": {} },
  _source: ["account_number", "balance"]
  "size": 10
}

// where account equals 20
{
  "query": { "match": { "account_number": 20 } }
}

3. Mapping

Timestamp 2

Enable and store timestamp

curl -XPOST localhost:9200/test

{
"mappings" : {
    "_default_":{
        "_timestamp" : {
            "enabled" : true,
            "store" : true
        }
    }
  }
}'

Filter

curl -XPOST elastic:9200/index/type/_search -d '
{
  "query" : {
    "filtered" :
    {
      "query" : { "term" : { "feature" : 1 } } ,
      "filter" : {
        "and" : [
          {
            "range": {
              "_timestamp": {
                "from": 1441964671000,
                "to": 1441964672000
              }
            }
          }
        ]
      }
    }
  }
}

Relationships Management 3 4

Inner Object

  • πŸ‘ Easy, fast, performant
  • πŸ‘Ž No need for special queries
  • β˜› Only applicable when one-to-one relationships are maintained

Nested

  • πŸ‘ Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.
  • πŸ‘Ž Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document. This can be very expensive for large nested docs
  • πŸ‘Ž β€œCross referencing” nested documents is impossible
  • β˜› Best suited for data that does not change frequently

Parent/Child

  • πŸ‘ Updating a child doc does not affect the parent or any other children, which can potentially save a lot of indexing on large docs
  • πŸ‘Ž Children are stored separately from the parent, but are routed to the same shard. So parent/children are slightly less performance on read/query than nested
  • πŸ‘Ž Parent/child mappings have a bit extra memory overhead, since ES maintains a β€œjoin” list in memory
  • πŸ‘Ž Sorting/scoring can be difficult with Parent/Child since the Has Child/Has Parent operations can be opaque at times

Denormalization

  • πŸ‘ You get to manage all the relations yourself!
  • πŸ‘Ž Most flexible, most administrative overhead
  • β˜› May be more or less performant depending on your setup

4. Backup

Elastic Dump 5

Tools for moving and saving indicies.

bin/elasticdump \
  --input=http://localhost:9200/index_1 \
  --output=http://localhost:9200/index_1_backup \
  --type=data \
  --scrollTime=100 \

Alias 6

curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } },
        { "add" : { "index" : "test1", "alias" : "alias2" } }
    ]
}'

5. Module Scripting 7

Ranking

Rank #2 from DB-Engines Ranking of Search Engines

Advertisements

4 thoughts on “Elasticsearch

  1. Pingback: Kibana | datayo
  2. Pingback: Neo4J | datayo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s