Content theft for dummies

by Cem Basman

You want to suck content from your favorite site? The author doesn't provide you RSS? No problem, buddy. There is remedy. Two alternate scrapers are now on market: Feedyes and Feed43. They scrape any HTML site and build a neat feed for you in seconds. You'll never have to write your own content again. It's real easy. Content theft for dummies. Creative commons rights? Who cares. Real bad.

Think about it.

Tags:

Comments

Does this mean there finally is a way to get full excerpts from vowe.net? ;-)

... either that or you wait a little and give the guy at http://www.rssextender.com a liittle bit more time, currently alpha.

Does require .Net though :(

This is really just the tip of the iceberg. There's another scraper with the initials WS that takes it much further. It doesn't simply create a feed from a site, but it scrapes an entire site, static or with an RSS feed, manipulates it using a thesaurus program to substitute synonyms for key words and then reposts the whole shebang on a site of your choosing. It's possible, for example, to completely scrabe the IMDB site.

This is the direction scraping technology is heading. Very scary stuff.

If you want more infor about this scraper, just email me through my site. I'll gladly share my info with you but I don't want to post the name freely on the Web. The less amount of advertising I can give them, the better.

Some days ago, lots of people were complaining about a new Swiss web service that took/still takes newsfeeds of various blogs and re-published them, see e.g.
BloggingTom's article (in German). There were and there are still various discussions in the blogsphere regarding publishing only excerpts or the entire content as feed. And now these aforementioned services just scrape any HTML content and provide them as feed.
However, there are various tools available for grabbing website content and store it locally on the user's hard disk. FeedYes & Feed43 are doing this in a similar manner, but do provide RSS instead of HTML files. And these tools are web services, and are typical for Web 2.0: providing an online service for this purpose to get rid off offline tools.
Thus, I see no problem regarding the services itself since they just replace offline tools and provide additional features.
But I agree: these services make it even more easier for thieves to copy the content. But that's life, when publishing something in the web, it is easy to copy it.

Post a comment











Shall I remember this for you?




Use your full name and a working email address. Unless you want your comment to be removed. No kidding.



Recent comments

Thomas Gumz on How to save half a gig of disk space in a couple of seconds at 03:11
Sebastian Herp on Amazing photos - all taken with a mobile phone at 01:02
Armin Roth on From my inbox at 01:01
Volker Weber on How to save half a gig of disk space in a couple of seconds at 00:18
Mark Elgar on How to save half a gig of disk space in a couple of seconds at 00:04
Volker Weber on How to save half a gig of disk space in a couple of seconds at 23:52
Ed Brill on How to save half a gig of disk space in a couple of seconds at 23:39
Volker Weber on Google Gears beta for Safari at 22:36
Ben Poole on Google Gears beta for Safari at 22:20
Alexander Kluge on How to save half a gig of disk space in a couple of seconds at 21:44
Volker Weber on How to save half a gig of disk space in a couple of seconds at 21:24
Martin Christian Kautz on How to save half a gig of disk space in a couple of seconds at 21:20
Claurice Jackson on How to save half a gig of disk space in a couple of seconds at 20:41
Andy Brunner on How to save half a gig of disk space in a couple of seconds at 19:56
Norlailawati Zain on What's the Notes market share really like? at 19:26
Richard Kaufmann on Department of Homeland Security launches Electronic System for Travel Authorization at 18:35
Lennard Timm on Password not appropriate at 14:37
Adalbert Duda on Password not appropriate at 14:03
Roger Schwarz on Synchronizing iPhone with ... Lotus Notes at 13:57
Ben Rose on Put a Porsche in your driveway at 13:31
Ben Rose on Put a Porsche in your driveway at 13:22
Ben Rose on Zones at 13:10
Nick Daisley on Put a Porsche in your driveway at 13:03
Ben Rose on Put a Porsche in your driveway at 12:50
Karsten Lehmann on Tweet of the day at 12:31

Ceci n'est pas un blog

vowe.net is a personal website published by Volker Weber a.k.a. vowe. I am an author, consultant and systems architect based in Darmstadt, Germany.

rss Click here to subscribe

Hello

About me
Contact
Publications
Certificates
Frequently asked questions

Twitter Updates

More >

Poll

Can you bring a camera phone to work?

Getting poll results. Please wait...

Local time is 06:59

visitors.gif
103 visitors online

News

Other sources of news, imported into my own format to make them more accessible:

Heise Online
Schlagzeilen
Weather

Archives

As most of my articles roll off the front page rather quickly, I am making an archive of previous posts available here. You can also use the handy search box at the top of the page if you are looking for something particular.

Last 30 days
More archives

Got the T-shirt?

Got the T-shirt?
Are you buying from the US?

Systems Architecture

This site runs on an Apache web server on top of the Linux operating system. The content is managed with MovableType which is implemented in Perl. Last but not least the HTML code your browser sees is put together with PHP.

© 1992-2008 Volker Weber.
All Rights Reserved.

Impressum