Detailed presentaton of ElasticSearch

Features

Open Source real-time search and analytics engine with a dedicated ecosystem of tools to feed it, manage it and use it.

Fully-featured search

  • Relevance-ranked text search
  • Scalable search
  • High-performance geo, temporal, range and key lookup
  • Highlighting
  • Support for complex / nested document types
  • Spelling suggestions
  • Powerful query DSL
  • “Standing” queries
  • Real-time results
  • Extensible via plugins

Powerful faceting/analysis

  • Summarize large sets by any combinations of time, geo, category and more
  • “Kibana” visualization tool

Management

  • Simple and robust deployments
  • REST APIs for handling all aspects of administration/monitoring
  • Special features to manage the life cycle of content

Integration

  • Hadoop (Map/Reduce, Hive, Pig, Cascading, …)
  • Client libraries (Python, Java, Ruby, Javascript, …)
  • Data connectors (Twitter, JMS, …)
  • Logstash ETL framework, a standalone Java software that will allow you to create various data pipelines to stream your information to the Elastic Search Engine and many other different destinations.
  • The “Beat” framework, a set of lightweight Data Shipers, they can be used to follow files changes (“tail”), send system metrics (CPU, memory, network, processes, …) or application metrics (Apache, Mysql, MongoDB, Kafka, Icinga monitoring server, …) to Elastic Search Engine or Logstash.
 

X-Pack, adding management features with a commercial-only package

  • One single bundle (previously – before Elastic version 5 – the same functions were offered by independent packages)
  • Paid add-on
  • Implement security: authentication, authorization and auditing with role based authorizations.
  • Implement alerting: get notified when a defined search query returns some meaningful result (eg: 5 times “access denied” on SSH daemon of server X, maybe meaning an attempt to access it by non-legitimated user)
  • Implement monitoring: a GUI to manage your installation and see what’s happening to your indexes (cluster and indexes health, performances analysis, …)
  • Implement reporting: generate and mail static print-optimized and PDF-formatted reports from some real-time Kibana visualization and dashboards.
  • Implement a graphical representation of your data, exploring links between documents you may not have suspected to exist.


Support

  • Development and Production support with tiered levels
  • Support staff are the core developers of the products
 

The Hadoop connector (es-hadoop)

  • Integrate with MapReduce, Spark, Pig, Hive, Storm, …
  • Allow the classification of data on top of the existing data model
    • Used at Wikipedia, Salesforce, Facebook when you search something
  • The connector is a library to be embedded into your application (support for Cloudera, Hortonworks (certified), MapR, …)
  • Hive integration is done in the form of an external Hive table
  • Spark integration is done in the code of your application, with primitives for Java or Scala
  • Query sent to this connector will be split and sent to each node of the ES cluster then the result sent back will be re-assembled to form the complete answer, so the connector can cope with the fact that the ES cluster is degraded (not green - some nodes gone)
 es-hadoop.png