Data scraping 'can be simple or very hard'
2009-07-31
For those people that carry out screen scraping, it can either be
a simple or particularly difficult task, it has been suggested. This
depends on how complex the source is, according to Martin Streicher,
writing for Linux Magazine.
The tools for carrying out scraping
activity are mainly the same, whatever the task is, Mr Streicher
explained. He admitted to scraping himself, noting that he had
probably scraped tens of sites in the past for purposes such as
aggregating and analysing sales data.
There are a number of
tasks that those looking to carry out scraping activities will need to
take on, he explained. The first step they will have to take will be
the identification of content they are interested in, then moving on
to finding those sites that have the desired information, Mr Streicher
asserted. Scrapers will then need to determine if the data on the site
is accessible and the find or create tools to collect pages and
extract data, he added.
People that do carry out scraping
activities may run into trouble, however, as recently highlighted by
Ryanair's announcement that it has lodged proceedings in the High
Court in Dublin against Travelviva AG, a German screen scraping ticket
tout. The airline has claimed that Travelviva has been carrying out
unauthorised screen scraping as well as reselling Ryanair's flights
with unjustified mark-ups. Ryanair said that it is planning to carry
out more actions against other European unauthorised screen scrapers
in the coming weeks.
"Ryanair is determined to continue its
crusade against screen scraping ticket-tout websites until the last
screen scraper stops overcharging unsuspecting consumers and breaching
Ryanair's copyright and terms of use of www.ryanair.com," said Ryanair's
Daniel de Carvalho.
"We are confident that unauthorised screen
scraping and overcharging of consumers will eventually be outlawed
throughout Europe, to the benefit of consumers and legitimate
businesses," he added.


Directory






