The Basics
FeedBlitz fetches RSS feeds for use when a FeedBlitz client does one of the following:
- Sets up an RSS to Email newsletter
- Sets up FeedBlitz an an RSS feed proxy (similar to Google Feedburner, for example)
When FeedBlitz's servers fetch an RSS feed, they will do so as follows:
- The user agent will typically be "FeedBlitz/1.0" (without the quotes)
- Where a client has FeedBurner compatibility enabled for our RSS service, the user agent will start "FeedBurner/1.0 compatible: FeedBlitz"
- The reverse DNS for the originating IP in production use will always have a PTR of a host at feedblitz.com (e.g. mail99.feedblitz.com)
FeedBlitz may occasionally fetch URLs from feedblitz.com IPs using other user user agents (see below). However, this will not be the case for consistent RSS feed fetching in production use on a client's behalf, and will very much be the exception, not the norm.
The FeedBlitz RSS feed fetcher is not a web site crawler. We only fetch the specific feed URL(s) that a client has asked us to fetch on their behalf to power an RSS feed or an RSS feed-powered email. As such, the FeedBlitz agent does not check robots.txt as it does not apply.
FeedBlitz's servers will only use GET or HEAD on underlying source resources. We do not use any other HTTP methods.
RSS feed fetching, caches, CDNs, and server load management.
In general, FeedBlitz clients want timely updates to the versions of their RSS feeds we cache on their behalf. Moreover, when using FeedBlitz's RSS to email service, clients want consistent content generated for their email subscribers. The behavior of the FeedBlitz RSS agent therefore varies, depending on whether it is being used for our RSS feed service (a Feedburner alternative), or to power a client's RSS to email newsletter.
RSS feed proxy service
- If a web site cache or CDN is detected, FeedBlitz will employ cache busting to ensure that we always have the chance to get the latest content to avoid update propagation delays.
- Caching, whether locally or in a CDN, should be disabled for the FeedBlitz agent using a FeedBlitz IP on the source server / CDN, to minimize frustration over delayed, missed, or inconsistent RSS feed updates.
- Source web sites should set an Etag HTTP header in response to a FeedBlitz request. In typical usage, when just looking for incremental changes, FeedBlitz will set the If-None-Match header in every request to minimize back end load, in the expectation that this would typically generate a 304 response.
- New, newly updated, or rapidly changing RSS feeds may be checked very few minutes for new content.
- If your site has PubSubHubbub (PuSH) capabilities, FeedBlitz will attempt to subscribe to the hub, and reduce its fallback polling frequency to ~30 minutes.
- If you set the <ttl> element, or use <sy:updatePeriod> and <sy:updateFrequency> in your source feed to indicate desired cache lifetimes, FeedBlitz will check the source slightly more frequently than the specified interval, and at least daily, to respect the feed owner's TTL indication while simultaneously trying to pick up changes relatively promptly.
- If no changes are detected in the source RSS feed over several days, the FeedBlitz agent will poll it less frequently.
- A forced "resync" from the FeedBlitz console by the client, or the detection of new content, will make FeedBlitz's RSS feed service treat the source feed as both new and active.
- FeedBlitz will only ever fetch at most 5MB of RSS content, and typically stop at 1MB (most RSS feeds are much smaller than this).
RSS to email newsletters
- The FeedBlitz agent will check the source newsletter feed on the newsletter's schedule.
- For daily feeds, this will be daily. For Express and ASAP schedules, this will be at least once every 30 minutes.
- Unlike the RSS feed proxy service, when used for email, FeedBlitz's agent does not send Etag headers. Instead, it will request the whole source feed.
- Unlike the RSS feed proxy service, the newsletter's schedule determines when the feed will be checked. <ttl> and <sy:...> RSS feed elements are therefore ignored.
- If your site has PubSubHubbub (PuSH) capabilities, FeedBlitz will attempt to subscribe to the hub, and reduce its fallback polling frequency to ~30 minutes. You should use the ASAP schedule for email, and link your newsletter to a FeedBlitz version of your RSS feed to minimize load on your source servers.
- Like the RSS proxy service, FeedBlitz will only fetch at most between 1MB and 5MB of RSS content (most RSS feeds are much smaller than this).
Source Server Load Management
It is highly unlikely that FeedBlitz's RSS feed proxy service will overburden a source site.
However, FeedBlitz's RSS to email service uses a highly distributed architecture. During an RSS to email check, while FeedBlitz attempts to minimize source site load by caching its results wherever possible, it is very possible that at the start of an RSS to email schedule, your source URL may experience many requests for the same content from different FeedBlitz servers; this is expected behavior.
In the unlikely event that this causes your source sever to have issues, the following may be used to mitigate effects:
- Create a FeedBlitz version of your source RSS feed, and use the FeedBlitz version of your feed to power your mailings. This will dramatically reduce the load on your back end servers, as the bulk of the traffic necessary for fetching the latest RSS feed content will remain within FeedBlitz's data centers.
- Ensure your source sever sets an etag, responds correctly with a 304 when unchanged and the correct if-none-match header is present, and that it does not otherwise cache FeedBlitz requests.
- Optionally set <ttl> and <sy: > channel elements to further reduce polling
- Use PuSH to reduce polling to an absolute minimum
- Switch your newsletter's schedule from Express / ASAP to a slower intraday check.
- Remember, unless your feed is for a podcast, RSS feeds are not archives. Reduce the size of your RSS feed, both in terms of the number of articles in the feed, and the RSS feed's overall size (in bytes), to ensure that you have enough in the feed to cover all posts you want to send in the newsletter, but not too much more.
- Minimize the number of plugins that alter the content of your RSS feed when it is fetched, so an RSS fetch, even for all content, is ultimately a request for a simple resource that is easily generated and maintained by your source sever.
Non-RSS URL Fetching
FeedBlitz may check web sites for potential abuse and user-hostile characteristics when generating emails to send. It may use the FeedBlitz/1.0 or Feedburner alternative agents for this task, or it may use mimic a phone's agent to look for hostile sites and / or redirects. This behavior is not frequent, does not happen in bulk, and is highly specific to FeedBlitz's client.
In general, FeedBlitz strongly urges CDN providers and cache vendors that no HTTP requests emanating from an IP with a hostname (PTR) on feedblitz.com (e.g. mail99.feedblitz.com ) be blocked or cached.
Technical Support
If you would like to discuss any of this with us, or, after taking our recommended load mitigation steps above you still feel that FeedBlitz is overburdening your site, please do contact the technical support team. You should have web server logs for any such problematic requests ready for FeedBlitz to analyze in this case.