The Scrape pipeline stage allows the content of an RSS item from the RSS feed itself to be replaced with the content scraped from the item's webpage.
A string, either builtin
or the module name of a specific extractor (see below).
convert_to_data_uris
A boolean that controls whether images in posts should be fetched from the web, converted to data URIs and injected into RSS items.
This option will significantly increase the database size as images will be stored directly in the DB.
Note: This option may be disabled by server administrators and is restricted to certain MIME types (PNG, JPG, TIFF, HEIF, and HEIC).
Extractors define how the contents of a web page are isolated from the rest of the page. There is a builtin
extractor which uses a general purpose algorithm for isolating and extracting contents from the web page, but for some websites it may be unreliable. For this reason, there are a number of builtin extractors for specific websites.
- beckyhansmeyer.com:
Frenzy.Pipeline.Extractor.BeckyHansmeyer
- daringfireball.net:
Frenzy.Pipeline.Extractor.DaringFireball
- ericasadun.com:
Frenzy.Pipeline.Extractor.EricaSadun
- finertech.com:
Frenzy.Pipeline.Extractor.FinerTech
- 512pixels.net:
Frenzy.Pipeline.Extractor.FiveTwelvePixels
- macstories.net:
Frenzy.Pipeline.Extractor.MacStories
- om.co:
Frenzy.Pipeline.Extractor.OmMalik
- whatever.scalzi.com:
Frenzy.Pipeline.Extractor.WhateverScalzi