Frequently asked questions about scraping
If you have questions regarding scraping or how to handle it, feel free to drop an email to firstname.lastname@example.org and we will answer. All information will be kept confidential, nothing that can be used to identify you or your company will be published.
Q: What is the scraped data used for?
A: This is very much dependant on the data of course and impossible to give a general answer but some examples may be:
- Launching competing services
- Building telemarketing databases (specific for yellow/white pages)
- Building link farms
- Reselling services
- Content for adsense pages
Q: Why don't you just use a captcha test? That will block all scripts!
A: Yes and no, in some environments a captcha test may be very useful, for example for registering a single thing but in other places it may be more or less useless. If you take for instance a large database in where users are supposed to do several searches giving each user a captcha test is not an option in most cases. Even if you use it in conjunction with rate limiting to detect site scrapers you will still have problems with large gateways and spiders.
Q: How can I block an IP from acessing my site?
A: There are three main ways of doing this,
- In a firewall or other packet filtering device
- In the webserver by using .htaccess or similar
- In the application itself