Web Scraping Services

January 2020 - September 2022

I implemented a system to continuously scrape data from multiple licensing sources using Puppeteer for web automation, integrated with BrightData and Webshare for proxy management and overcoming scraping blockers. To maintain reliability, I debugged frequent client-side changes and developed solutions for handling CAPTCHAs and other anti-bot technologies. This ensured consistent data extraction while adapting to evolving website defenses and requirements.

Key Features

Core Web Scraping Service

The core web scraping service was designed to consume and process data from 42 licensing sources, extracting pertinent information and mapping it to a standardized format. This data was then securely saved into an internal database and made accessible through a REST API for use in multiple business units.

Integrations with Bright Data and Webshare

To overcome scraping restrictions, we integrated with Bright Data and Webshare, enabling seamless access to blocked content by utilizing proxy networks and advanced routing techniques. This ensured uninterrupted data extraction and improved the reliability of the scraping service.

Mentorship of Junior Engineers

I mentored junior engineers in debugging issues with external data sources, guiding them through complex problem-solving processes. This often involved investigating and understanding other technology stacks and architectures to identify the root cause of issues and implement effective solutions.