diff options
Diffstat (limited to 'old/published/How To Wiki/howto-deleted web pages.txt')
-rw-r--r-- | old/published/How To Wiki/howto-deleted web pages.txt | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/old/published/How To Wiki/howto-deleted web pages.txt b/old/published/How To Wiki/howto-deleted web pages.txt new file mode 100644 index 0000000..6823293 --- /dev/null +++ b/old/published/How To Wiki/howto-deleted web pages.txt @@ -0,0 +1,35 @@ +There's nothing more frustrating than searching for a page, finding what looks like a promising result and then clicking though only to discover that the page is gone. Unfortunately it happens all the time servers get jammed, pages are removed, some moved and some simply no longer maintained. But what happens you want to find a page that's gone? Is there anything you can do? The answer depends somewhat on why the page was removed. + +#The Slashdot Effect + +Some sites, particularly smaller independent publishers and bloggers can't handle the traffic influx from having a link show up on Slashdot or Digg. The sites simply stop responding as their servers become overwhelmed. However, you might still be able to see the content using "Coral Cache"<http://www.coralcdn.org/>. + +Coral Cache is a service that uses distributed computing to lessen the so-called Slashdot effect. Coral Cache was developed to provide a distributed mirror of the original page that can handle the high traffic volume. + +But don't worry, you don't need any special software, just append .nyud.net to the end of a regular URL and you'll hit the page through Coral Cache rather than directly connecting. + +It won't be quite as fast as you may be used to (compare "wired.com"<http://www.wired.com/> directly with the "Coral Cache"<http://www.wired.com.nyud.net/> version) but it could help you get to content that's currently choked full of direct connections. + +#Content that's been removed + +Perhaps the easiest trick to see deleted web pages that were removed by their publisher is to use Google's cache feature. Search of the original page and, if it's in Google's cache you'll see a little link leading to the page as it looked the last time Google indexed it. + +In some cases this will lead you straight to the content you want. However, sometimes that wont work. The page owner may have replaced the original page with new content and if Google's indexing spiders have been back to the page since the change you won't see the old content. + +In those cases you may be out of luck, but there is one final thing you can try. + +#The Wayback Machine + +The "Internet Archive"<http://www.archive.org/index.php> is a non-profit organization founded with the goal of building an Internet library that could offer permanent access to webpages for researchers, historians, and scholars. + +The Internet Archive's ambitious goal of indexing every page of content that ever been on the public web is not a reality, but it tries, and it just might have the page you seek. + +The "Wayback Machine"<http://www.archive.org/web/web.php> is a search engine that takes a URL and then looks for pages on that site over time. Using the Wayback Machine you can often find pages that have since been removed or deleted from the live web. + +In some cases the pages may appear a bit mangled and won't necessarily have all the formatting of the original -- stylesheets may not work, Javascript doesn't function -- but you can at least get at the actual text content. + +At the time of writing the Internet Archive boasts 85 billion webpages and it also recently started archiving other files like movies, audio files and live music, though it's indexes for multimedia content are not as extensive as the web page offerings. + +#preventing pages from disappearing. + +Many of today's popular web-based bookmark services offer page caching as a feature. Ma.gnolia for instance takes a snapshot of a page when you bookmark it and caches the contents. This is helpful for ensuring that your favorite bookmarked pages don't disappear on you. If they do, just head to ma.gnolia and click through to the cached version. Del.icio.us and others offer similar features.
\ No newline at end of file |