1 Scrape Stage
Shadowfacts edited this page 2019-11-02 03:19:13 +00:00

The Scrape pipeline stage allows the content of an RSS item from the RSS feed itself to be replaced with the content scraped from the item's webpage.

extractor

A string, either builtin or the module name of a specific extractor (see below).

convert_to_data_uris

A boolean that controls whether images in posts should be fetched from the web, converted to data URIs and injected into RSS items.

This option will significantly increase the database size as images will be stored directly in the DB.

Note: This option may be disabled by server administrators and is restricted to certain MIME types (PNG, JPG, TIFF, HEIF, and HEIC).

Extractors

Extractors define how the contents of a web page are isolated from the rest of the page. There is a builtin extractor which uses a general purpose algorithm for isolating and extracting contents from the web page, but for some websites it may be unreliable. For this reason, there are a number of builtin extractors for specific websites.

  • beckyhansmeyer.com: Frenzy.Pipeline.Extractor.BeckyHansmeyer
  • daringfireball.net: Frenzy.Pipeline.Extractor.DaringFireball
  • ericasadun.com: Frenzy.Pipeline.Extractor.EricaSadun
  • finertech.com: Frenzy.Pipeline.Extractor.FinerTech
  • 512pixels.net: Frenzy.Pipeline.Extractor.FiveTwelvePixels
  • macstories.net: Frenzy.Pipeline.Extractor.MacStories
  • om.co: Frenzy.Pipeline.Extractor.OmMalik
  • whatever.scalzi.com: Frenzy.Pipeline.Extractor.WhateverScalzi