Update page 'Scrape Stage'
parent
4651e08553
commit
7023569ef8
|
@ -0,0 +1,23 @@
|
|||
The Scrape pipeline stage allows the content of an RSS item from the RSS feed itself to be replaced with the content scraped from the item's webpage.
|
||||
|
||||
### `extractor`
|
||||
A string, either `builtin` or the module name of a specific extractor (see below).
|
||||
|
||||
### `convert_to_data_uris`
|
||||
A boolean that controls whether images in posts should be fetched from the web, converted to data URIs and injected into RSS items.
|
||||
|
||||
This option will significantly increase the database size as images will be stored directly in the DB.
|
||||
|
||||
**Note:** This option may be disabled by server administrators and is restricted to certain MIME types (PNG, JPG, TIFF, HEIF, and HEIC).
|
||||
|
||||
## Extractors
|
||||
Extractors define how the contents of a web page are isolated from the rest of the page. There is a `builtin` extractor which uses a general purpose algorithm for isolating and extracting contents from the web page, but for some websites it may be unreliable. For this reason, there are a number of builtin extractors for specific websites.
|
||||
|
||||
- beckyhansmeyer.com: `Frenzy.Pipeline.Extractor.BeckyHansmeyer`
|
||||
- daringfireball.net: `Frenzy.Pipeline.Extractor.DaringFireball`
|
||||
- ericasadun.com: `Frenzy.Pipeline.Extractor.EricaSadun`
|
||||
- finertech.com: `Frenzy.Pipeline.Extractor.FinerTech`
|
||||
- 512pixels.net: `Frenzy.Pipeline.Extractor.FiveTwelvePixels`
|
||||
- macstories.net: `Frenzy.Pipeline.Extractor.MacStories`
|
||||
- om.co: `Frenzy.Pipeline.Extractor.OmMalik`
|
||||
- whatever.scalzi.com: `Frenzy.Pipeline.Extractor.WhateverScalzi`
|
Loading…
Reference in New Issue