Back

As a human being who has spent way too much time on web scraping, I suggest that you use Golang for webscraping.

1

16

Tell me more. I've used Python + JS, but never Golang for scraping...

2

36

Btw: I've used dusk for scraping before. I have a small project that requires Scraping and the appeal of both packages above is perhaps I can deploy on Vapor for some serverless action... so really, it's me being ~~lazy~~ efficient with DevOps.

21

mm where to start... Firstly, the concurrency! It's amazing and incredibly performant to use golang concurrency for WebScraping operations if you are planning to work with large datasets it's convenient to handle the process concurrently TIME IS MONEEEY!
+ combining it with aws products such as for scraping task: lambda or ec2, for storage: s3 for raw data, cloud watch for monitoring-logs and step functions to orchestrate destination batches. This infrastructure and using golang with colly to scrape data steps up your game. If you need interaction with the website you wanna scrape you can use chromedb module for it. Thats how I do for serious stuff.

1

94

I forgot to mention cloudformation once you decide which services you want from aws you can set up a stack to build your services just by one click it creates all the services you need and you can also automate to stop services or execute something else once your work is done with scraping.

I know how ~~lazy~~ you are and this is how ~~lazy~~ I am 😂 I think that's something you would like

93