This article explains what advanced feed crawling is, how it works, and when you should use it to enrich your curations library.
What is Advanced Feed Crawling?
Advanced Feed Crawling is an optional feature in Letterhead that enhances the way metadata is collected from RSS feeds. In most cases, this feature is not needed and can be left off. Letterhead will still import content just fine using standard RSS metadata.
ℹ️ Note
Only enable Advanced Feed Crawling if your RSS feeds are missing key metadata (like images, excerpts, or author names) and you need to supplement that data from the original webpage.
When you add an RSS feed to your Curations Library, Letterhead can either:
- Use metadata directly from the RSS feed (title, description, image), or
- Crawl the linked webpage to extract additional or updated metadata using Open Graph tags and other structured data.
By enabling advanced crawling, you can pull richer information from the webpage itself but this may increase crawl times and is not always necessary.
How Does It Work?
- When disabled (default): Letterhead uses only the data provided by the RSS feed.
- When enabled: Letterhead’s crawler bots visit the URL linked in the RSS feed’s <link> node and scan the page’s <head> section for Open Graph tags.
When to Use Advanced Feed Crawling
Only enable this setting if you encounter any of the following issues:
- Missing metadata: The feed lacks images, excerpts, or author information.
- Outdated or incomplete feed data: The webpage has better or more current content than the feed.
- Custom publisher names: You manage multiple brands or domains under one RSS feed and need distinct site_name values for each.
- Author info required: The feed doesn’t include author names, but the webpage does.
When to Keep Advanced Feed Crawling Disabled
In most cases, we recommend leaving this feature turned off. You should not enable it if:
- The RSS feed includes all the required metadata (e.g., images, excerpts, authors, canonical URLs).
- You’re managing a high volume of content and want to minimize API calls or improve ingestion speed.
- Your site has bot restrictions (e.g., requires authentication or blocks crawlers).
- You need to stay within a limited crawl budget, such as avoiding 429 or 503 server errors.
How to Enable Advanced Feed Crawling
To turn this feature on for a specific feed:
- Go to Newsletter > Curations > RSS Sources.
- Either click on an existing feed to edit, or click Add Content to add a new feed.
- In the form that appears, check the box labeled “Advanced feed crawling.”
- Click Update Feed or Add Feed to save your changes.
Technical Requirements & Limitations
To ensure successful crawling, your website must allow Letterhead’s crawler access. This means:
- Whitelist the following domains to permit bot access:
*.tryletterhead.com, letterhead.email, and letterhead.ai - Crawler user-agent:
Mozilla/5.0 (compatible; LetterheadCurationBot/1.0; +http://tryletterhead.com/bots)
⚠️ Important Considerations
- Bot traffic may increase significantly after enabling this feature, especially if you manage multiple feeds.
- Letterhead cannot crawl content behind paywalls or login requirements, including private websites and social media platforms.
Best Practices
- Test before scaling: Try enabling advanced crawling on a few feeds first to evaluate whether it improves metadata quality.
- Audit your feeds regularly: Use tools like the W3C Feed Validator to ensure your feeds are properly formatted and up to date.
Need Help?
If you have questions or need assistance, please contact our support team at support@tryletterhead.com or log a support ticket through the Help Center!