Defining sources you want your scraper to collect articles from

Joakim Updated by Joakim

Collecting articles from a specific domain

Add a source for the domain you wish to collect articles from, i.e. example.com

  • Articles from example.com will be collected
  • Articles from subdomains, i.e. news.example.com will be ignored

Collecting articles from multiple domains

If you have multiple websites built to follow the same structure you can create a single scraper to collect articles from any of the sites. Define which sources you want the scraper to accept articles from, i.e. one source for example-one.com and one source for example-two.com

Collecting articles from subdomains

To collect articles from any subdomain of example.com use a wildcard character followed by the domain, i.e. *.example.com

  • Articles from subdomains, i.e. news.example.com or sport.example.com will be collected
  • Articles from URLs without any subdomain, i.e. example.com will be ignored

Collecting articles from a specific subdomain

To collect articles from a specific subdomain of example.com add a source including the subdomain you want to match, i.e. news.example.com

  • Articles from news.example.com will be collected
  • Articles from sport.example.com will be ignored

Collecting articles from a specific path

You can collect articles from a specific path by adding a source including the path you wish to match, i.e. example.com/news/

  • Articles from example.com/news/ will be collected
  • Articles from example.com/sport/ will be ignored

Combining sources to collect specific articles

You can define multiple sources for your scraper. As long as the incoming article URL matches one of the sources, the article will be collected, i.e. to collect articles matching both sport.example.com and example.com/sport/ but ignore all other articles you'd define two sources.

How did we do?

Introduction to Strossle Scraper

Contact