You can edit almost every page by Creating an account. Otherwise, see the FAQ.

Crawlee

From EverybodyWiki Bios & Wiki

Script error: No such module "Draft topics". Script error: No such module "AfC topic".

Crawlee
Developer(s)Apify
Initial release13 July 2022 (2022-07-13)
Written inTypescript, Python
Engine
    Operating systemWindows, macOS, Linux
    TypeWeb crawler
    LicenseApache License 2.0

    Search Crawlee on Amazon.

    Crawlee is a free and open-source web-crawling and browser automation library developed by Apify. The original TypeScript version was first released in 2022, with a Python version added in 2024.

    Crawlee's architecture is built around modular crawlers responsible for extracting data from websites.[1]. The library follows a declarative programming approach, where users define crawling logic through a structured set of rules. Crawlee uses queues to manage requests; for each request, a specific function is executed to extract data or perform further processing[2].

    Crawlee supports both headless browser sessions (via Playwright and other browser automation software) and plain HTTP request-based scraping.

    It also provides various web-scraping-related utilities, such as a sitemap parser[3] or an automatic HTTP proxy manager.

    Notable mentions of Crawlee's use in web-crawling projects include GPT Crawler by Builder.io[4] and various generative AI projects maintained by AWS Labs[5].

    History[edit]

    The first stable TypeScript version was released in 2021 under the name Apify SDK[6]. This version offered both the open-source crawling framework and the proprietary storage implementation for use on the Apify platform.

    In 2022, version v3.0.0 was released[7], renaming the library to Crawlee. This update made Crawlee independent of the Apify Platform, moving most of the Apify-specific features into a separate package (also named Apify SDK).

    In 2024, a beta version of Crawlee for Python was released[8]

    References[edit]

    1. Koekemoer, Jakkie. "Web Scraping with Crawlee: Step-By-Step Tutorial". Bright Data.
    2. Nechytailo, Yelyzaveta. "Crawlee Tutorial: Easy Web Scraping and Browser Automation". oxylabs.io.
    3. "Release v3.7.0 · apify/crawlee". GitHub. Retrieved 22 September 2024.
    4. "BuilderIO/gpt-crawler: Crawl a site to generate knowledge files to create your own custom GPT from a URL". GitHub. Retrieved 21 September 2024.
    5. "awslabs/generative-ai-cdk-constructs: AWS Generative AI CDK Constructs are sample implementations of AWS CDK for common generative AI patterns". GitHub. Amazon Web Services - Labs. 20 September 2024. Retrieved 21 September 2024.
    6. "Release v1.0.0 · apify/crawlee". GitHub.
    7. "Release v3.0.0 · apify/crawlee". GitHub.
    8. "Announcing Crawlee for Python: Now you can use Python to build reliable web crawlers | Crawlee · Build reliable crawlers. Fast". crawlee.dev. 5 July 2024.


    This article "Crawlee" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Crawlee. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.