Archive for the 'Technology News' Category

The Internet Wayback Machine

feeds
alexa internet

Alexa Internet, in cooperation with the Internet Archive, has designed a three dimensional index that allows browsing of web documents over multiple time periods, and turned this unique feature into the Wayback Machine.

“The original idea for the Internet Archive Wayback Machine began in 1996, when the Internet Archive first began archiving the web. Now, five years later, with over 100 terabytes and a dozen web crawls completed, the Internet Archive has made the Internet Archive Wayback Machine available to the public. The Internet Archive has relied on donations of web crawls, technology, and expertise from Alexa Internet and others. The Internet Archive Wayback Machine is owned and operated by the Internet Archive.”

How large is the Wayback Machine? The Internet Archive Wayback Machine contains almost 2 petabytes of data and is currently growing at a rate of 20 terabytes per month. This eclipses the amount of text contained in the world’s largest libraries, including the Library of Congress.

What type of machinery is used in this Internet Archive? Much of the Internet Archive is stored on hundreds of slightly modified x86 servers. The computers run on the Linux operating system. Each computer has 512Mb of memory and can hold just over 1 Terabyte of data on ATA disks. However we are developing a new way of storing our data on a smaller machine. Each machine will store 1 terabyte. For more information go to www.petabox.org.

RSS - Really Simple Syndication

    RSS

feeds

RSS (Really Simple Syndication) is a family of Web feed formats used to publish frequently updated content such as blog entries, news headlines or podcasts. An RSS document, which is called a “feed,” “web feed,” or “channel,” contains either a summary of content from an associated web site or the full text. RSS makes it possible for people to keep up with their favorite web sites in an automated manner that’s easier than checking them manually.

RSS content can be read using software called an “RSS reader”, “feed reader” or an “aggregator”. The user subscribes to a feed by entering the feed’s link into the reader or by clicking an RSS icon in a browser that initiates the subscription process. The reader checks the user’s subscribed feeds regularly for new content, downloading any updates that it finds.

The initials “RSS” are used to refer to the following formats:

    * Really Simple Syndication (RSS 2.0)
    * RDF Site Summary (RSS 1.0 and RSS 0.90)
    * Rich Site Summary (RSS 0.91)

RSS formats are specified using XML, a generic specification for the creation of data formats.

Web Robots “bots”

feeds
Internet bots, also known as web robots, WWW robots or simply bots, are software applications that run automated tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human editor alone. The largest use of bots is in web spidering, in which an automated script fetches, analyses and files information from web servers at many times the speed of a human. Each server can have a file called robots.txt, containing rules for the spidering of that server that the bot is supposed to obey.

In addition to their uses outlined above, bots may also be implemented where a response speed faster than that of humans is required (e.g., gaming bots and auction-site robots) or less commonly in situations where the emulation of human activity is required, for example chat bots.

Bots are also being used as organization and content access applications for media delivery. Webot.com is one recent example of utilizing bots to deliver personal media across the web from multiple sources. In this case the bots track content updates on host computers and deliver live streaming access to a browser based logged in user.
Contents

    IM and IRC

Some bots communicate with other users of Internet-based services, via instant messaging (IM), Internet Relay Chat (IRC), or another web interface. These chatter bots may allow people to ask questions in plain English and then formulate a proper response. These bots can often handle many tasks, including reporting weather, zip-code information, sports scores, converting currency or other units, etc. Others are used for entertainment, such as SmarterChild on AOL Instant Messenger and MSN Messenger and Jabberwacky on Yahoo! Messenger. Another popular AIM bot is FriendBot

An additional role of IRC bots may be to lurk in the background of a conversation channel, commenting on certain phrases uttered by the participants (based on pattern matching). This is sometimes used as a help service for new users, or for censorship of profanity.

AOL Instant Messenger has now introduced a feature that allows you to make a screen name into a bot. This new feature removes the rate limit on the screen name, however it is now limited in the amount of instant messages that can be sent and received.

    Commercial purposes

There has been a great deal of controversy about the use of bots in an automated trading function. Auction website eBay has been to court in an attempt to suppress a third-party company from using bots to traverse their site looking for bargains; this approach backfired on eBay and attracted the attention of further bots. The United Kingdom-based bet exchange Betfair saw such a large amount of traffic coming from bots they launched a WebService API aimed at bot programmers through which Betfair can actively manage bot interactions.

Me.dium map widget

feeds
The Me.dium Map Widget gives you a real-time view into all the Me.dium users on and around your site. You can customize your widget to see who is reading your site, discover the activity around your favorite web site. Just Join Me.dium to get started.