Apache Solr is a fast open-source Java search server.
Solr enables you to easily create search engines which searches websites, databases and files.
Solr (pronounced “solar”) is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is the second-most popular enterprise search engine after Elasticsearch.
Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr’s external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.
An Elasticsearch / Apache Solr index is the equivalent of a SQL table.
An Elasticsearch or Solr server (aka Solr instance, aka Solr engine) can maintain several indexes.
(Elasticsearch index configuration is done with HTTP / JSON commands. No files required. You define types, mappings, analysis with simple commands.)
In Apache Solr, each index is defined by a schema.xml file (it’s not mandatory in Solr 5/6, but recommended in production), and a solrconfig.xml file. The index schema is equivalent to a SQL table schema definition. (See this post for Solr Schema related resources.)
An index contains several documents, equivalent to SQL table rows. Each document contains fields, equivalent to SQL table columns.
When an index document is inserted/updated/deleted, we say it is “indexed”.
To retrieve documents from an index, Elasticsearch (json) / Apache Solr (xml, json) provide an http API, with a proprietary syntax.
Elasticsearch and Apache Solr are web applications. A client will use their http API to query or store data.
A full-text search engine is built from the ground to tackle problems that a SQL search find difficult or impossible. The list of those features is huge: multi-language, dedicated plugins to extend the engine, synonyms, stop words, facets, boosts, …
The core search engine of Elasticsearch and Apache Solr is Apache Lucene. The relationship between Elasticsearch / Apache Solr and Lucene, is like that of the relationship between a car and its engine.
You can access Solr admin from your browser: http://localhost:8983/solr/
use the port number used in installation.
See below for some useful Solr related resources:
- Getting Started,
- Using the Solr Administration User Interface,
- Client APIs,
- Understanding Analyzers, Tokenizers, and Filters,
- Indexing and Basic Data Operations,
- The Well-Configured Solr Instance,
- Documents, Fields, and Schema Design,
- Solr Glossary,
- Managing Solr
- Install Apache Solr on Ubuntu 16.04
- Using Apache Solr with Python
- Configuring the Solr 6 using Admin Console
- Apache Solr Tutorial on Tutorialspoint
- Solr Concept and Architecture (pdf)
- Solr in 5 minutes (pdf) (check more Solr resources on this website, see the right navigation bars. See below for some I selected.)
- Top 10 Performance Tips for Apache SOLR (pdf)
- Apache Solr resources on Khai’s personal knowledge vault (pdf) — It contains many pretty good resources and concise explanations of some Solr usages such as fuzzy search and facet.
- What is an Elasticsearch / Apache Solr index ? (pdf)
- What are Elasticsearch and Apache Solr ? (pdf)
- How does SOLR work? What is an explanation for the principle in layman’s terms? (pdf) — It gave pretty good explanation how Solr works, including schema explanation.
- Basic Elasticsearch Concepts (pdf)
- Importing/Indexing database (MySQL or SQL Server) in Solr using Data Import Handler