This article helps you:
Set up the Website scraper
Understand what you can do with the Website scraper
The Website source repository lets you scrape a public-facing website to pull in articles that may be useful for your users. Because there are such a wide variety of websites and rich content that can be included in any specific page, read the following information carefully when using this content source.
If your documentation repositories are managed through a platform for which there is an integration, use the integration. The dedicated Resource Center integrations offers a better end-to-end experience. The website scraper is a powerful tool, but one that should be considered a fallback option.
The website scraper provides unique options when it comes to setting up the source details. Use this information in conjunction with the procedure in the main Source Repository page.
There are two options when specifying the exact URLs you want to include:
If you click the Advanced link beneath the URL section, you can access the following options:
The website scraper, by default, pulls in all information about a page, not just the main content. This can include the page header, metadata, and other information. This means that not only is the main content of a page pulled into a Resource Center article, but extraneous content can also appear in the Resource Center articles. Amplitude does use heuristics to identify and remove as much extraneous content as possible, but some content may still be visible in the article. You can use the content selectors to help target and identify content to add to the Resource Center article.
Selectors override what the scraper considers the main content is on your page. The website scraper uses the selector to pull only content associated with that selector. You can create multiple selectors to target different pieces of content.
When you have multiple selectors, the website scraper looks for each selector in the order that you created it. If no selectors are found in a specific URL link, the website scraper saves information based on Amplitude heuristics.
The ignore elements selectors lets you remove content from the final output of a Resource Center article. For example, you may not want to include tooltips, images, or sidebars that, while useful on your website, are not relevant in a Resource Center article.
As with the main content selectors, you can add multiple ignore selectors and the website scraper looks for each one in the order that they appear. If no ignore selectors are found in a URL link, the website scraper includes all content.
There are some pages that the website scraper automatically ignores. These typically include landing pages or pages that only contain links and no other text. Usually, these types of pages don't contain information suitable for a Resource Center article and can be safely removed from the source repository.
However, occasionally, you may want to re-include these pages to be active in your source repository.
June 26th, 2025
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2025 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.