Managing the Crawler

Learn how to control and optimize Swiftype's crawler for maintaining up-to-date search indices across New Relic sites.

Automatic Crawling

Swiftype automatically crawls all configured sites every 12 hours using:

  • Sitemap.xml files for structured discovery
  • Link following for comprehensive coverage
  • Smart scheduling to avoid overwhelming servers

Crawling can be delayed, with reindexing times averaging up to four days. Plan content updates accordingly and use manual methods for urgent changes.

Manual Crawling

When immediate updates are required, you can trigger a manual crawl (limited to once every 12 hours):

Steps to Initiate Manual Crawl

  1. Access the Swiftype Admin UI
  2. Navigate to the Content section
  3. Enter the page URL in the search filter
  4. If the page isn't indexed, click Add this URL
  5. Important: Include a trailing slash to avoid indexing errors
bash
# Correct format with trailing slash
https://docs.newrelic.com/docs/apm/new-relic-apm/getting-started/

# Incorrect format - may cause indexing errors
https://docs.newrelic.com/docs/apm/new-relic-apm/getting-started

Adding and Updating Pages

Add a New Page

Before adding a new page, ensure:

  • Page has no noindex tag
  • Content is publicly accessible
  • URL follows proper formatting

Process:

  1. In Swiftype Admin UI, go to Content > Filter pages
  2. Enter the full URL (with trailing slash)
  3. If not indexed, click Add this URL
  4. Page should be indexed within minutes
  5. Refresh to verify indexing status

Update an Existing Page

When URLs change, old URLs may remain in the index with outdated content. To reindex:

  1. Navigate to Content in Swiftype Admin UI
  2. Paste the old URL to locate the record
  3. Click Reindex on the record details page

Custom Rankings Warning

Reindexing a page with a changed URL will lose custom result rankings tied to the old URL. These rankings must be manually reassigned to the new URL.

Best Practices

URL Management

  • Always use trailing slashes consistently
  • Implement proper 301 redirects for changed URLs
  • Update sitemap.xml files promptly
  • Monitor crawler logs for errors

Content Preparation

  • Ensure pages have proper meta descriptions
  • Use semantic HTML structure
  • Include relevant keywords naturally
  • Avoid duplicate content across pages

Monitoring Crawler Health

Regular checks to perform:

  • Weekly: Review crawler logs for errors
  • Bi-weekly: Verify new content is indexed
  • Monthly: Audit index for outdated pages
  • Quarterly: Full index health assessment

Common Issues and Solutions

IssueCauseSolution
Page not indexingMissing trailing slashAdd URL with trailing slash
Outdated content showingCrawler delayTrigger manual reindex
Duplicate pages indexedCanonical issuesFix canonical tags
Missing pagesNoindex tag presentRemove noindex tag

TBD Enhancements

Future Improvements

Consider implementing:

  • Automated crawler monitoring alerts
  • Bulk URL submission tools
  • Crawler performance dashboards
  • API-based index management scripts