IResearch (search library)
| Developer(s) | Andrey Abramov |
|---|---|
| Repository | github |
| Written in | C++ |
| Engine | |
| Operating system | Cross-platform |
| License | Apache License 2.0 |
| Website | www |
Search IResearch (search library) on Amazon.
IResearch (stylised as IResearch; standing for Information REtrieval SEARCH) is a cross-platform, high-performance information retrieval library written in C++. The library provides full-text search, geospatial search, vector search over embedding vectors, inverted index management and pluggable similarity measure ranking. It was designed as a native C++ alternative to Apache Lucene, intended for direct integration into database systems. The current codebase has been in development since 2014 at the EMC Corporation Skolkovo software development centre in Russia, was open-sourced in 2016 under the Apache License 2.0 and subsequently became the search engine powering the ArangoSearch feature of ArangoDB from 2017. As of 2025, development is continued at SereneDB, a Berlin-based database startup, where IResearch serves as the foundation of a combined search and OLAP database system.
Background
Enterprise search infrastructure has historically been dominated by Apache Lucene, a JVM-based information retrieval library first released in 1999.[1] Lucene and its derivatives, including Elasticsearch, OpenSearch, Apache Solr, MongoDB Atlas, CrateDB, SingleStore and Neo4j, underpin the search capabilities of many database and analytical platforms.[2]
IResearch was conceived as a C++ alternative designed for direct integration into database kernels, where the overhead of a JVM is undesirable and where search and data storage can be architecturally unified rather than maintained as separate systems that exchange data without a shared query model.[3]
History
Origins at Quest Software and EMC Corporation
The conceptual and exploratory phase of IResearch began when its author, Andrey Abramov, was working at Quest Software's Russian development office in senior engineering positions.[4] The current C++ codebase began development in 2014, after Abramov joined the EMC Corporation Skolkovo software development centre, where the library was built as part of an internal "Data Discovery" initiative.[5] On 21 October 2016, the codebase was published on GitHub under the Apache License 2.0, with copyright attributed to EMC Corporation.[6]
Integration into ArangoDB as ArangoSearch (2017–2024)
In 2017, Abramov joined ArangoDB GmbH in Cologne and copyright of IResearch was transferred to ArangoDB GmbH.[6] IResearch was integrated into the database core under the user-facing name ArangoSearch and reached production readiness with ArangoDB 3.4 in December 2018.[7] The integration made it possible to combine full-text search, graph traversal and relational joins within a single query, a capability that had not been available in prior ArangoDB releases.[3] ArangoSearch appeared in the DB-Engines ranking of the top ten most widely used search engine technologies[8] and received attention from independent trade press as a solution offering comparable functionality to Elasticsearch within a multi-model database context.[9]
Continuation at SereneDB (2025–present)
Abramov, Malandin and Mironov, all former employees of ArangoDB, co-founded SereneDB GmbH in Berlin in March 2025.[10][3] In December 2025, SereneDB raised a pre-seed funding round of $2.1 million led by Entourage and High-Tech Gründerfonds.[10] At SereneDB, IResearch serves as the backbone of a system that combines search with OLAP processing under a PostgreSQL-compatible interface.[3] The iresearch-toolkit GitHub repository was archived in December 2025 and all further development continues at https://github.com/serenedb/serenedb.[6]
Performance
In March 2026, IResearch achieved first place overall in the Search Benchmark Game, an open community benchmark maintained by the Tantivy project that evaluates search library performance across multiple query types against an English Wikipedia corpus using the AOL query dataset.[11] The benchmark includes Tantivy, Apache Lucene, PISA and other search libraries and measures query latency across intersection, union, phrase and top-k retrieval patterns.
Architecture
- Index structure. An IResearch index is organised into segments, each containing an inverted index, columnar storage and optional skip structures. Segments are immutable once written. Updates and deletions are recorded as new revisions, allowing readers to observe a consistent snapshot while writers continue to ingest data. Concurrent reads and writes are supported without external locking.
- Write and read interfaces. Ingestion is performed through an
IndexWriter, which accepts document batches and indexes field values per transaction. Retrieval is performed through anIndexReader, which evaluates queries against a consistent snapshot of the index. - Query evaluation. Queries are constructed programmatically from primitive building blocks, including term, phrase, boolean, range and geospatial filters, rather than parsed from a query string. Evaluation is carried out by a pipeline of iterators that merge posting lists and pass candidates to the scoring stage, with non-leading iterators deferring work until required.
- Scoring. Scoring is performed as a block-at-a-time vectorised pipeline over posting-list data. The scoring function is configurable at query time. The library ships with Okapi BM25 and TF-IDF implementations and custom scorers may be registered through the public interface.
- Columnar storage. Field values are stored alongside the inverted index within each segment, allowing numeric attributes, timestamps and other per-document signals to be consumed directly by the scoring pipeline.
- Vector and geospatial indexing. The library supports approximate nearest-neighbour search over embedding vectors and S2-based geospatial indexing for intersection and containment queries, both integrated with the same transaction protocol as the inverted index.
- Text analysis. Field values pass through configurable analysers providing tokenisation, character and token filtering and language-specific processing, with support for custom analysis components.
- Transactional integration. IResearch is designed to be embedded within a host database rather than operated as a separate process. Search indexes participate in the host's write-ahead log and commit protocol, providing snapshot isolation between the search index and the primary data store.
Adoption
Global Relay
Global Relay, a provider of compliant electronic communications archiving and messaging serving over 20,000 customers in 90 countries, deployed ArangoSearch as the directory search engine for its collaboration suite, replacing Elasticsearch, which had presented scalability limitations for its contextual relevance requirements. The deployment combined BM25-ranked full-text search with graph traversal in a single AQL query. Global Relay reported dramatically simplified development and improved contextual relevancy as outcomes of the deployment. The case study was presented at ArangoDB Summit 2022.[12]
Plural Technology
Plural Technology, a provider of product lifecycle management solutions for over 200 manufacturing clients, deployed ArangoSearch as part of a new part discovery platform replacing a decade-old SQL-based system. Queries against approximately nine million records demonstrated 38% faster performance and the platform supported accurate retrieval despite misspellings. The deployment was presented at ArangoDB Summit 2022.[13]
License
IResearch is distributed under the Apache License 2.0. The copyright history of the library reflects its institutional transitions:
- 2016–2017: EMC Corporation
- 2017–2023: ArangoDB GmbH
- 2024–present: SereneDB (serenedb/serenedb repository)
See also
- ArangoDB
- Apache Lucene
- Elasticsearch
- Full-text search
- Information retrieval
- Inverted index
- Okapi BM25
References
- ↑ "Apache Lucene". Apache Software Foundation. Retrieved 2026-04-17.
- ↑ "Apache Lucene". Apache Software Foundation. Retrieved 2026-04-17. "Why Atlas Search". MongoDB, Inc. Retrieved 2026-04-17. "Full-text search". CrateDB. Retrieved 2026-04-17. "Working with Full-Text Search". SingleStore. Retrieved 2026-04-17. "Full-text indexes". Neo4j, Inc. Retrieved 2026-04-17.
- ↑ 3.0 3.1 3.2 3.3 "SereneDB lands $2.1M to fuse search, analytics and Postgres into one engine". Tech.eu. 2025-12-03. Retrieved 2026-04-15.
- ↑ "Andrey Abramov – Speaker Profile". C++Online. Retrieved 2026-04-15.
- ↑ "Andrey Abramov – LinkedIn profile". LinkedIn. Retrieved 2026-04-15.
- ↑ 6.0 6.1 6.2 "iresearch-toolkit/iresearch". GitHub. Retrieved 2026-04-15.
- ↑ "ArangoDB 3.4 GA: Full-text Search, GeoJSON, Streaming & More". ArangoDB. 2018-12-13. Retrieved 2026-04-15.
- ↑ "The Best Search Engine Technology: Top 10 Engines Analysed". Searchanise. Retrieved 2026-04-15.
- ↑ "ArangoDB 3.5 update improves multi-model database platform". TechTarget. 2019-08-21. Retrieved 2026-04-15.
- ↑ 10.0 10.1 "SereneDB Secures $2.1M Pre-Seed Funding". High-Tech Gründerfonds. 2025-12-03. Retrieved 2026-04-15.
- ↑ "Search Benchmark Game". Tantivy / quickwit-oss. Retrieved 2026-04-15.
- ↑ "Global Relay: Modeling contextual relevance with search views and graph traversals". ArangoDB. Retrieved 2026-04-15.
- ↑ "Plural Technology: Graph thinking for a new part discovery platform". ArangoDB. Retrieved 2026-04-15.
This article "IResearch (search library)" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:IResearch (search library). Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.
