Scrapy
Скрапинг и переход на следующую страницу:
def parse(self, response):
products = response.css('product-item')
for product in products:
#here we put the data returned into the format we want to output for our csv or json file
yield{
'name' : product.css('a.product-item-meta__title::text').get(),
'price' : product.css('span.price').get().replace('<span class="price">\n <span class="visually-hidden">Sale price</span>','').replace('</span>',''),
'url' : product.css('div.product-item-meta a').attrib['href'],
}
next_page = response.css('[rel="next"] ::attr(href)').get()
if next_page is not None:
next_page_url = 'https://www.chocolate.co.uk' + next_page
yield response.follow(next_page_url, callback=self.parse)
Прокси¶
- Scrapoxy - Scrapoxy is a super proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers 🕸️.
- proxy pool
- Scrapy Beginners Series Part 4: User Agents and Proxies - базовая работа с проксями
- Tor proxy - выбор прокси через Tor