Managing the Crawler

Learn how to control and optimize Swiftype's crawler for maintaining up-to-date search indices across New Relic sites.

Automatic Crawling

Swiftype automatically crawls all configured sites every 12 hours using:

Sitemap.xml files for structured discovery
Link following for comprehensive coverage
Smart scheduling to avoid overwhelming servers

Crawling can be delayed, with reindexing times averaging up to four days. Plan content updates accordingly and use manual methods for urgent changes.

Manual Crawling

When immediate updates are required, you can trigger a manual crawl (limited to once every 12 hours):

Steps to Initiate Manual Crawl

Access the Swiftype Admin UI
Navigate to the Content section
Enter the page URL in the search filter
If the page isn't indexed, click Add this URL
Important: Include a trailing slash to avoid indexing errors

bash

# Correct format with trailing slash
https://docs.newrelic.com/docs/apm/new-relic-apm/getting-started/

# Incorrect format - may cause indexing errors
https://docs.newrelic.com/docs/apm/new-relic-apm/getting-started

Adding and Updating Pages

Add a New Page

Before adding a new page, ensure:

Page has no noindex tag
Content is publicly accessible
URL follows proper formatting

Process:

In Swiftype Admin UI, go to Content > Filter pages
Enter the full URL (with trailing slash)
If not indexed, click Add this URL
Page should be indexed within minutes
Refresh to verify indexing status

Update an Existing Page

When URLs change, old URLs may remain in the index with outdated content. To reindex:

Navigate to Content in Swiftype Admin UI
Paste the old URL to locate the record
Click Reindex on the record details page

Custom Rankings Warning

Reindexing a page with a changed URL will lose custom result rankings tied to the old URL. These rankings must be manually reassigned to the new URL.

Best Practices

URL Management

Always use trailing slashes consistently
Implement proper 301 redirects for changed URLs
Update sitemap.xml files promptly
Monitor crawler logs for errors

Content Preparation

Ensure pages have proper meta descriptions
Use semantic HTML structure
Include relevant keywords naturally
Avoid duplicate content across pages

Monitoring Crawler Health

Regular checks to perform:

Weekly: Review crawler logs for errors
Bi-weekly: Verify new content is indexed
Monthly: Audit index for outdated pages
Quarterly: Full index health assessment

Common Issues and Solutions

Issue	Cause	Solution
Page not indexing	Missing trailing slash	Add URL with trailing slash
Outdated content showing	Crawler delay	Trigger manual reindex
Duplicate pages indexed	Canonical issues	Fix canonical tags
Missing pages	Noindex tag present	Remove noindex tag

TBD Enhancements

Future Improvements

Consider implementing:

Automated crawler monitoring alerts
Bulk URL submission tools
Crawler performance dashboards
API-based index management scripts