Welcome to EverybodyWiki 😃 ! Nuvola apps kgpg.png Log in or ➕👤 create an account to improve, watchlist or create an article like a 🏭 company page or a 👨👩 bio (yours ?)...

Comparison between the Wayback Machine and Archive.Today

From EverybodyWiki Bios & Wiki

This is a technical comparison between the Wayback Machine and Archive.Today, the two most popular webpage archival services on the Internet.

History[edit]

Archive.Today was fonuded in 2012 as a Wayback Machine alternative with support for fragments (#) in URLs and so-called hash bang URLs (#!).

The Internet Archive, which is the organisation that powers the Wayback Machine, was founded in 1996.

Archival format[edit]

Archive.Today stores pages in a modified memento format.

All CSS is converted to inline CSS code in one HTML document, stripping off selectors such as :hover.

On Archive.Today, JavaScript will be removed entirely from the archived page. Content generated using JavaScript during the server-side crawl is visible in a frozen state on the saved page

The Wayback Machine retains a verbatim copy of site's contents of which the original source code of each original file is retrievable using URL parameters.[which?]


Supported media types[edit]

The Wayback Machine supports the archival of any media type, while Archive.Today is limited to JPG, PNG, GIF, WEBP, text, SVG, CSV, JSON, JavaScript Code as text, and XML.

Of HTML5 videos, only the placeholder will be archived.

Video archival however is supported on Imgur.[1]

Speed[edit]

Capturing[edit]

While Archive.Today often handles URL submissions in a queue, the Wayback Machine can archive immediately.

Indexing[edit]

Both services usually index captures immediately, making them retrievable and searchable. However, before 2013, the Wayback Machine indexed pages with a delay of 6 to 18 months.[2][3] In late 2019, the Wayback Machine suffered from several technical issues with their indexing, causing indexing delays and temporary unavailability of pages.

Rate limiting[edit]

In mid-2019, both the Wayback Machine and Archive.Today have imposed strict rate limiting, both of which have changed over time.

The Wayback Machine limits submissions to a seemingly a random number of pages per minute, and sometimes shows HTTP 429 on the first submission, although a former HTTP 429 error page suggested a limit of 15 pages per minute.

Archive.Today's rate limitations are seemingly an unspecified number of pages over an unspecified time span. One hour may be used as rule of thumb.

While the Wayback Machine shows a HTTP 429 error page whose appearance changed over time, Archive.Today used to stop responding to browser requests. Later, Archive.Today started showing Google Captcha to users hitting rate limits.

Using the respective submission forms, Archive.Today allows archiving the same page every 5 minutes while the Wayback Machine only allows only the same page once per 20 minutes (formerly 10 and 5 minutes).

Page exclusion[edit]

Some sites including several major Internet portald such as Quora and 4Chan are excluded from being archived and viewed by the Wayback Machine.

Other features[edit]

Saving outlinks[edit]

The Wayback Machine allows saving the outlinks of pages along the page. Since early 2020, this feature is only available to users who are logged in.

Bottomless scrolling[edit]

On bottomless pages such as YouTube comments and Quora, Archive.Today has the ability to scroll down and crawl a few screen heights of additional content from bottomless pages.

Archive.Today has more seamless support for crawling websites that rely on AJAX and XHR.

Progressive web applications[edit]

On November 29th 2019, Archive.Today has upgraded their headless' browsing engine for crawling sites from the abandonware PhantomJS to Chromium, enabling the archival of content served through progressive web applications such as Instagram and Twitter Lite.

However, only pages captured via PhantomJS can not be downloaded as ZIP file.

User-agent passthrough[edit]

When archiving a page through the live web, which means appending the URL to web.archive.org/save/, the user agent of the used browser will get passed through the live web to the website, which can influence the appearance of sites utilizing dynamic serving.

In 2020, this feature has occasionally (including March 13 for a few days and October 13th) been replaced with a behaviour where the site is archived as if it were submitted to the submission form.

The request to open the site with the live web URL will be responded with a redirect to the permanent URL (with time stamp) as soon as it is finished, which however has always been the default behaviour for non-HTML content such as images, videos, plain text files and other binary data.

Archival of URL fragments[edit]

Archive.Today supports the archival of URLs with URL fragments (hashtag sign #) and hash bangs (#!).

Cross-backup[edit]

Archive.Today supports backing up pages from the Wayback Machine, WebCitation.org, Google Cache and more sites while adapting the date and time to that of the source, indicated as original time stamp.

In comparison, the Wayback Machine is unable to back up pages from Archive.Today.

Also see[edit]

  • WebCite
  • Perma.CC

Related navigation boxes: ElectronicsMobile phonesData storageUser experience and user interfaces

References[edit]

  1. Imgur video archived by Archive.Today (short URL)
  2. Baron, Alexander (2013-10-23). "The new Internet Archive Wayback Machine now online". www.digitaljournal.com. Retrieved 2020-09-10.
  3. Poal.co post: Did you know?: Before 2013, the Wayback Machine indexed new crawls in 6 to 18-month intervals.