From 7023569ef8a2feaa4fd36a6fe24faabd6ce16a0b Mon Sep 17 00:00:00 2001
From: Shadowfacts <me@shadowfacts.net>
Date: Sat, 2 Nov 2019 03:19:13 +0000
Subject: [PATCH] Update page 'Scrape Stage'

---
 Scrape-Stage.md | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100644 Scrape-Stage.md

diff --git a/Scrape-Stage.md b/Scrape-Stage.md
new file mode 100644
index 0000000..7286fd1
--- /dev/null
+++ b/Scrape-Stage.md
@@ -0,0 +1,23 @@
+The Scrape pipeline stage allows the content of an RSS item from the RSS feed itself to be replaced with the content scraped from the item's webpage.
+
+### `extractor`
+A string, either `builtin` or the module name of a specific extractor (see below).
+
+### `convert_to_data_uris`
+A boolean that controls whether images in posts should be fetched from the web, converted to data URIs and injected into RSS items.
+
+This option will significantly increase the database size as images will be stored directly in the DB.
+
+**Note:** This option may be disabled by server administrators and is restricted to certain MIME types (PNG, JPG, TIFF, HEIF, and HEIC).
+
+## Extractors
+Extractors define how the contents of a web page are isolated from the rest of the page. There is a `builtin` extractor which uses a general purpose algorithm for isolating and extracting contents from the web page, but for some websites it may be unreliable. For this reason, there are a number of builtin extractors for specific websites.
+
+- beckyhansmeyer.com: `Frenzy.Pipeline.Extractor.BeckyHansmeyer`
+- daringfireball.net: `Frenzy.Pipeline.Extractor.DaringFireball`
+- ericasadun.com: `Frenzy.Pipeline.Extractor.EricaSadun`
+- finertech.com: `Frenzy.Pipeline.Extractor.FinerTech`
+- 512pixels.net: `Frenzy.Pipeline.Extractor.FiveTwelvePixels`
+- macstories.net: `Frenzy.Pipeline.Extractor.MacStories`
+- om.co: `Frenzy.Pipeline.Extractor.OmMalik`
+- whatever.scalzi.com: `Frenzy.Pipeline.Extractor.WhateverScalzi`
\ No newline at end of file