Managing the Crawler
Learn how to control and optimize Swiftype's crawler for maintaining up-to-date search indices across New Relic sites.
Automatic Crawling
Swiftype automatically crawls all configured sites every 12 hours using:
- Sitemap.xml files for structured discovery
- Link following for comprehensive coverage
- Smart scheduling to avoid overwhelming servers
Crawling can be delayed, with reindexing times averaging up to four days. Plan content updates accordingly and use manual methods for urgent changes.
Manual Crawling
When immediate updates are required, you can trigger a manual crawl (limited to once every 12 hours):
Steps to Initiate Manual Crawl
- Access the Swiftype Admin UI
- Navigate to the Content section
- Enter the page URL in the search filter
- If the page isn't indexed, click Add this URL
- Important: Include a trailing slash to avoid indexing errors
# Correct format with trailing slash
https://docs.newrelic.com/docs/apm/new-relic-apm/getting-started/
# Incorrect format - may cause indexing errors
https://docs.newrelic.com/docs/apm/new-relic-apm/getting-started
Adding and Updating Pages
Add a New Page
Before adding a new page, ensure:
- Page has no noindex tag
- Content is publicly accessible
- URL follows proper formatting
Process:
- In Swiftype Admin UI, go to Content > Filter pages
- Enter the full URL (with trailing slash)
- If not indexed, click Add this URL
- Page should be indexed within minutes
- Refresh to verify indexing status
Update an Existing Page
When URLs change, old URLs may remain in the index with outdated content. To reindex:
- Navigate to Content in Swiftype Admin UI
- Paste the old URL to locate the record
- Click Reindex on the record details page
Custom Rankings Warning
Reindexing a page with a changed URL will lose custom result rankings tied to the old URL. These rankings must be manually reassigned to the new URL.
Best Practices
URL Management
- Always use trailing slashes consistently
- Implement proper 301 redirects for changed URLs
- Update sitemap.xml files promptly
- Monitor crawler logs for errors
Content Preparation
- Ensure pages have proper meta descriptions
- Use semantic HTML structure
- Include relevant keywords naturally
- Avoid duplicate content across pages
Monitoring Crawler Health
Regular checks to perform:
- Weekly: Review crawler logs for errors
- Bi-weekly: Verify new content is indexed
- Monthly: Audit index for outdated pages
- Quarterly: Full index health assessment
Common Issues and Solutions
| Issue | Cause | Solution |
|---|---|---|
| Page not indexing | Missing trailing slash | Add URL with trailing slash |
| Outdated content showing | Crawler delay | Trigger manual reindex |
| Duplicate pages indexed | Canonical issues | Fix canonical tags |
| Missing pages | Noindex tag present | Remove noindex tag |
TBD Enhancements
Future Improvements
Consider implementing:
- Automated crawler monitoring alerts
- Bulk URL submission tools
- Crawler performance dashboards
- API-based index management scripts