Sentor
Blockscraping.com What is scraping? Prevent scraping News about scraping Data seeding Scraping FAQ Risk assessment Managed anti-scraping service About Sentor Contact us

Frequently asked questions about scraping

If you have questions regarding scraping or how to handle it, feel free to drop an email to faq@sentor.se and we will answer. All information will be kept confidential, nothing that can be used to identify you or your company will be published.

Q: What is the scraped data used for?

A: This is very much dependant on the data of course and impossible to give a general answer but some examples may be:

  • Launching competing services
  • Building telemarketing databases (specific for yellow/white pages)
  • Building link farms
  • Reselling services
  • Content for adsense pages
And probably a million more uses.

Q: Why don't you just use a captcha test? That will block all scripts!

A: Yes and no, in some environments a captcha test may be very useful, for example for registering a single thing but in other places it may be more or less useless. If you take for instance a large database in where users are supposed to do several searches giving each user a captcha test is not an option in most cases. Even if you use it in conjunction with rate limiting to detect site scrapers you will still have problems with large gateways and spiders.

Q: How can I block an IP from acessing my site?

A: There are three main ways of doing this,

  1. In a firewall or other packet filtering device
  2. In the webserver by using .htaccess or similar
  3. In the application itself
Of these three the second is probably the simplest way information about how to write a correct .htaccess file can be found here.

News

Is Screen Scraping Legal? Read news about web scraping.

Facts about web scraping

Like the evil one, data scraping has many names. Below is a list of expressions which all are similar to "data scraping".

  • Web scraping
  • Screen scraping
  • Page scraping
  • Html scraping
  • Scrapping

Learn more about scraping »

Wikipedia on Web Scraping

"In some instances, plagiarized content may be used as an illicit means to increase traffic and advertising revenue. The typical scraper website generates revenue using Google AdSense, hence the term 'Made for AdSense' or MFA website."

Learn more at Wikipedia »
© Sentor 2008.