You can edit almost every page by Creating an account and confirming your email.

Norconex Web Crawler

From EverybodyWiki Bios & Wiki



Norconex Web Crawler
Other namesNorconex HTTP Collector
Developer(s)Norconex Inc.
Initial release2016
Stable release
3.0.2 / 2022-01-05
RepositoryGitHub Repository
Written inJava
Engine
    Operating systemCross-platform
    LicenseApache License
    WebsiteNorconex Web Crawler

    Search Norconex Web Crawler on Amazon.

    Norconex Web Crawler is a free and open-source web crawling and web scraping software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch, Microsoft Azure Cognitive Search, Amazon CloudSearch and more.[1][2][3]

    The Crawler can be run on its own or embedded in your own Java application.[4][5]

    Some key features are:

    • Multi-threaded
    • Extract text from a variety of file formats (HTML, PDF, Word, etc.)
    • Extract metadata associated with documents
    • Supports pages rendered with JavaScript
    • Incremental crawls
    • Supports external commands to parse or manipulate documents
    • Send extracted data to a variety of repositories

    Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.[6] [7]

    History

    Norconex Web Crawler was released as free and open-source software in 2013.[8]

    References

    1. "Committers". opensource.norconex.com.
    2. Hoppa, Jocelyn (10 February 2020). "Importing Data from the Web with Norconex & Neo4j". Graph Database & Analytics.
    3. "Deploy a Norconex HTTP Collector Indexer Plugin | Cloud Search". Google for Developers.
    4. Valcheva, Silvia (11 February 2018). "10 Best Open Source Web Crawlers: Web Data Extraction Software". Blog For Data-Driven Business.
    5. "Norconex HTTP Collector". Softpedia. Retrieved 25 September 2023.
    6. "SolrEcosystem - Solr - Apache Software Foundation". cwiki.apache.org.
    7. "Norconex Crawler Users". opensource.norconex.com.
    8. "Norconex Gives Back to Open-Source – Norconex Inc". Retrieved 2023-09-25.

    Mentions in Academic Research

    See also


    This article "Norconex Web Crawler" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Norconex Web Crawler. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.