Commit Graph

60 Commits

Author SHA1 Message Date
Shadowfacts 60a2dcd73f Fix Verge extractor failing for features 2023-12-03 21:30:52 -05:00
Shadowfacts 3e6211c9ba Fix unused binding warnings 2023-12-03 21:28:24 -05:00
Shadowfacts fec640a37a Improve The Verge extractor 2023-07-12 22:11:31 -07:00
Shadowfacts b0089083db Fix inline script tags not being removed 2023-07-12 20:50:17 -07:00
Shadowfacts 1f94e9080d Filter more things out of Slate and The Verge 2023-06-25 14:12:15 -07:00
Shadowfacts 6dd4f3ca82 Add ELB extractor 2023-06-25 14:06:18 -07:00
Shadowfacts 53cbe0a7e9 Update things, fix warnings 2023-06-25 14:03:16 -07:00
Shadowfacts 86d7ffc7d9 Make regex filters case insensitive 2022-12-05 10:58:49 -05:00
Shadowfacts e7184a2535 Add extractor for The Verge 2022-09-14 17:47:22 -04:00
Shadowfacts b9be2879ed Fix srcsets overriding rewritten image srcs 2022-07-17 15:13:13 -04:00
Shadowfacts 852db1520f Add birchtree.me extractor 2022-07-17 15:13:08 -04:00
Shadowfacts f0299639e2 Daring Fireball: strip dd tag 2022-01-15 14:53:03 -05:00
Shadowfacts 37a802b7a8 Don't put content from builtin extractor through readable_html twice 2021-11-06 12:01:23 -04:00
Shadowfacts d2d4651f1d Add Ars Technica extractor for multi-page articles 2021-11-06 12:00:35 -04:00
Shadowfacts e84ebc473a Add support for external readability implementation 2021-11-06 12:00:35 -04:00
Shadowfacts e3ec1d6040
Fix missing clause in scrape stage 2021-10-22 16:20:50 -04:00
Shadowfacts 6916647737
Don't try to convert data URIs to data URIs 2021-09-22 19:46:07 -04:00
Shadowfacts fce1bf6c2f
Add Sentry 2021-09-22 13:59:44 -04:00
Shadowfacts 6e0271bf4b
Slate extractor: strip newsletter signup form 2021-09-19 22:32:10 -04:00
Shadowfacts 5990d0e4c2
Add Slate extractor 2021-09-03 17:09:10 -04:00
Shadowfacts a85dca5b3d
Add filtering by item content 2021-08-28 12:17:16 -04:00
Shadowfacts 5c8baa2057
Generalize WP lazy-loading stripper 2021-03-31 20:19:01 -04:00
Shadowfacts 0593fcdb9a
Switch to hackney via Tesla 2021-03-31 19:33:19 -04:00
Shadowfacts 33d1cac5e1
Recover from errors in custom extractors 2021-03-31 15:30:17 -04:00
Shadowfacts 26b832b622
Fix whatever.scalzi.com extractor 2021-03-31 15:30:05 -04:00
Shadowfacts e10a614f3e
Switch back to HTTPoison 2021-03-31 14:43:59 -04:00
Shadowfacts 8e18a415eb
Fix error when attempting to convert image w/o Content-Type header to data URI 2020-10-24 13:37:06 -04:00
Shadowfacts a13b1d8181
Fix error rendering gemtext with preformatted blocks 2020-09-30 22:50:07 -04:00
Shadowfacts 1beff21fc5
Switch to Mojito for HTTP requests 2020-09-11 19:15:19 -04:00
Shadowfacts bd42073e24
Fix whatever.scalzi.com extractor 2020-08-14 21:55:38 -04:00
Shadowfacts ab105d71ae
Add Gemini document -> HTML converter stage 2020-07-18 23:13:42 -04:00
Shadowfacts 26bfb2e58f
Store item content MIME type 2020-07-18 23:13:24 -04:00
Shadowfacts 12bb742be9
Add Gemini protocol scrape stage 2020-07-18 19:50:41 -04:00
Shadowfacts 4f16933198
Add gemini protocol feed fetching 2020-07-18 19:27:53 -04:00
Shadowfacts fc2b8f6036
Add basic LiveView pipeline editor, scrape stage config editing 2020-06-08 22:49:45 -04:00
Shadowfacts 55c6d6fd88
Remove newsletter info from om.co extractor 2020-06-01 23:05:18 -04:00
Shadowfacts 4cccab8df0
Remove old code 2020-06-01 18:30:59 -04:00
Shadowfacts c37bed932f
Fix pipeline validation not working 2020-05-31 15:56:27 -04:00
Shadowfacts 4a09ce1cb0
Fix scraping images w/ URLs w/o schemes 2020-02-17 12:09:03 -05:00
Shadowfacts e684737fcd
Implement basic favicon scraping 2019-11-10 14:23:07 -05:00
Shadowfacts f84d849432
Add conditional stage
Allows applying another pipeline stage based on a condition, which can
either be a whole filter or a single filter rule.
2019-11-01 22:50:25 -04:00
Shadowfacts 13c44d5e10
Refactor filtering logic into separate module 2019-11-01 22:49:52 -04:00
Shadowfacts c9cc9f2428
Fix crash while scraping images 2019-11-01 18:29:41 -04:00
Shadowfacts 9264c9a97d
Add extractor for om.co 2019-11-01 18:27:15 -04:00
Shadowfacts 5d38d9567e
Fix error while validating scrape stage options 2019-11-01 18:27:08 -04:00
Shadowfacts 3bc37952d1
Add option to convert images in article content to data URIs 2019-10-31 21:59:55 -04:00
Shadowfacts 98a182986c
Fix module name of whatever.scalzi.com extractor 2019-10-31 19:15:35 -04:00
Shadowfacts 8c96a94cd3
Add extractor for finertech.com 2019-10-31 19:04:15 -04:00
Shadowfacts 118de4ae53
Add extractor for macstories.net 2019-10-31 18:48:36 -04:00
Shadowfacts 957f271425
Add extractor for 512pixels.net 2019-10-31 18:38:01 -04:00