We bloggers generally pride ourselves on uniqueness and creativity. We get a rush when we see others linking to our posts and reading our feeds, since it usually means that they find value in what we have to say. Unfortunately, not everyone who reads your blog does so for legitimate reasons. Some unscrupulous individuals in the blogosphere are only out to scrape your content for their own websites, ripping off your material and claiming it as their own.
Should I feel flattered?
After all, imitation is the greatest form of flattery, isn’t it? Generally speaking, no, you shouldn’t be happy about it. Scrapers have little respect for your content except insofar as it can make money for them. To use an analogy, they’re not celebrity impersonators; they’re guys in trench coats selling fake Rolex watches.
How do I stop it from happening?
Short of putting your blog behind a password barrier (which I don’t recommend), there’s really no defense against someone scraping your website. Basically, if they can read it, they can steal it, and there’s no sense in keeping everyone from reading your blog when 99% of your visitors don’t have any malicious intent.
There are some ways of hindering (but not stopping) scrapers. For example, if you’re technically savvy enough to understand server logs, you can deny access to your site based on IP, domain, or irregular user agent as recommended in the most recent Whiteboard Friday at SEOmoz. Note, however, that techniques like this can still be bypassed by clever scrapers.
How do I find scraper sites?
To catch scrapers in the act, just take a reasonably popular post on your site that’s a few weeks old. Find a short sentence in the post that seems unique, put quotes around it, and plug it into Google. Whatever comes up that isn’t your site is potentially a scraper. If you don’t find anything at first, try it with different posts and sentences until you’re satisfied that nobody’s scraping you.
Edit: Since writing this, several people have brought CopyScape to my attention as a useful tool for finding scrapers. It’s a search-like service that crawls a page on your site and tells you where others might have copied it. Despite requiring a paid membership to see full results, I’d say it’s pretty handy.
What should you do when you find them?
The first thing you should do is attempt to contact the scraper, requesting that they cease and desist their activities. This seems to have worked well for Maki over at Dosh Dosh, and it’s definitely the polite way of handling things (whether or not scrapers deserve the courtesy). If you don’t receive a positive response, it’s advisable to continue with a more aggressive approach.
There are several ways to make sure that scrapers get their comeuppance. For starters, if their site uses AdSense, you can report them to Google for a violation of the AdSense terms and conditions. The same may apply for other affiliate ad programs. If it results in their account being suspended, they won’t be making money off of your content or anyone else’s.
It’s also possible to tell the scraper’s web host about their illegitimate dealings. Just plug their domain name into any Whois lookup to get the contact info. Forget shutting down their income; if the host takes action, you may be able to shut down their entire site.
Blogs hosted on WordPress.com can also be reported for scraping.
Should I take legal action?
Copyright is tricky business on the World Wide Web, mainly because one country’s laws may not be respected across borders. If the scraper’s actions have been very damaging, you may want to consider legal action. Then again, there’s rarely a lot of harm done, so it’s probably best to forego a lawsuit. (Note: Since I’m not a legal professional, this is just my opinion. Go consult a lawyer if you want real legal advice.)
If you are considering mounting a lawsuit against a scraper, it may be worthwhile to review the Digital Millennium Copyright Act (in the US) or the European Union Copyright Directive (in Europe).
But aren’t they hurting my search engine rankings?
A few years ago, the answer might have been yes. Times have changed, though, and search engines have wised up to the practice of scraping. Chances are good that the scraper’s version of your content will be viewed as what it is, a copy. You’ll get the credit for your originality; they’ll get flagged as potential spam, eventually being penalized or delisted entirely.
Note that there’s a fine line between syndication and scraping. Some sites may echo your posts as a means of highlighting content around the web that they view as important. In fact, that’s one of the main reasons that RSS exists. Generally speaking, if the website in question gives you proper credit and features your content alongside others in a similar theme, its intent may not be malicious. If, however, their website is basically a copy of yours with no credit or links back to yours, it’s probably a scraper.
Lets say someone did want to take legal action. How can you prove which site posted the origal content , espically if they scrapper you and post your conten the same day.
Thaks
ok, I’m actually trying to make sure I’m not plagiarising and asking for help. I found a website that is run by a company, but it opens up at the end of the article to “comments”. One individuals comments were very well put along with some very good statistics on the particular subject. I am in the process of writing letters to representatives and want to include some of his statistics he posted……is that considered plagiarism? Does anybody know where I could do to find out?
i hate plagiarism
i saw this scraping thing a while ago.. well, what can you say about set up blogs, equipped with blog posts harvesting bots? the worst thing is, there is no link being retained.. or perhaps they set it to kill all copied urls in the fetched entry.
I think it really is so damn annoying that it’s quite difficult to not waste too much energy on the pursuit of ‘taking that person down’. In reality, these people that steal other’s content will not last long (I hope), and certainly, if they are stealing other people’s content, they are doing other things such as keyword stuffing or methods which Google don’t approve off. It would be rare to find one of these people who is actually a long term success. (Wishful thinking from somebody who believes in Karma.) So although contacting them is good, wasting too much time and energy, is not.
Thanks for all the great feedback, everyone. I’ve revised the original article above with the most noteworthy points. Feel free to keep the discussion going and I’ll be glad to update as more tips come into play.
Great post, I agree with most of your point. I followed up at About Weblogs with my own comments, if you’re interested.
Thanks for the article, i’ve been facing this problem with some of my blogs, now at least i a little more knowledge on how to deal with such sites.
Unfortunately it is not always the case that scrappers get low search engine ranking.
I submitted an article to digg, that soon got buried. A scrapper copied and dugg same article and it got to front page. The scrapper got all the links and credit from Google. I didn’t bother chasing it further because I thought it would be fruitless, But reading this maybe I should have done.
I agree Daniel. Ill read the article you have mentioned there now. As you said I don’t think we can stop plagiarism no matter what we do.
Stephen, we can use copyright icons to prevent other people from copying. Feedburner has some copyrights icon. Even if you go to copyscape.com, you’ll find a good way to stop plagiarism.
Nice post.
I have to admit, I’ve been pretty ignorant to content scrapers. Up to now I thought the worst thing I had to deal with was the steady flow of comment spam. This post has certainly been an eye opener for me. Looks like I’ll have do as suggested to see if someone us scraping me or not. Thanks Daniel.
Thanks for putting all of these tips into one place. This is a great resource to refer to again and again.
In addition to reporting scrapers to google and hosts, if they happen to be on wordpress.com (I have seen a few) WordPress accepts feedback about abuse and will contact the blog owner, sometimes even deleting the blog.
I get scraped pretty much daily, and I don’t have time to report it usually. I wish there was a plugin like Akismet that would handle this like it handles my spam comments! THAT would be a grand invention!
As an artist I worry about this every once and a while but never too much. Someone could easily poach an image of mine and display it on their website, but the images would never be quality enough to print. I make my off of prints so there is not much to worry about.
RIpping off text is a different issue. The text remains perfectly intact as does the context.
No matter what the medium is we as content creators, must be proactive in protecting the rights we have associated with our content.
Good thing search engines are starting to treat sploggers and scrapers harshly.
This alone should have a tangible impact on their incentive to rip stuff since most of those sites are using Adsnese to generate some money from organic traffic.