Commit Graph

84 Commits

Author SHA1 Message Date
Shadowfacts 71fa17caaf
Remove HTTPoison dependency 2018-11-22 13:00:53 -05:00
Adam Pogwizd 96b2621ff5 Updated README: minor grammar improvements as well as section for contributing 2018-10-20 09:45:21 +09:00
Ben Olive 5dd52d5698 Ensure `remove_tag` returns a valid html_tree
If the entire input is stripped out, this used to return `nil` which
caused downstream parsing to fail. Instead, return `[]` which is the
Floki representation of an empty tree.

Fixes #36
2018-10-11 10:35:16 +09:00
Ben Olive b35746bfed Strip out atom tags
Standard tags are returned by Mochiweb as binaries. The atom tags are
for special case parsing (such as php includes). Since that's not oging
to be part of the article, simply exclude those while normalizing.

Fixes #30

See also:

Mochiweb parser: 9608d786ef/src/mochiweb_html.erl (L345)
2018-10-11 10:34:29 +09:00
Jaehyun Shin c2dbdf14e8
Delete CHANGELOG.md 2018-07-24 18:55:24 +09:00
keepcosmos 2ed20b6fe1 update deps and deprecated 2018-07-24 18:50:08 +09:00
Jaehyun Shin 133044f50c
Merge pull request #37 from fribmendes/frm/img-tags
Convert relative img paths into absolute
2018-07-24 18:00:43 +09:00
Jaehyun Shin e076f77274
Merge pull request #38 from simonbowen/master
Made tests pass, Floki updated to allow encoding of special character…
2018-07-24 17:58:09 +09:00
Simon Bowen bbe8f6ad1a Made tests pass, Floki updated to allow encoding of special characters of entities, update readability to disable this. 2018-07-18 16:00:47 +01:00
Fernando Mendes ebc8c90e71 Convert relative img paths into absolute
Fixes #27
2018-06-30 11:14:17 +01:00
Jaehyun Shin 4f2449558d
Merge pull request #35 from chingan90/feature/formatter
Add Elixir 1.6 formatter config file and formatted the codebase
2018-02-12 10:28:31 +09:00
Jaehyun Shin 45fe9b1950
Merge pull request #34 from chingan90/feature/mime-regex-change
When we regex-check the MIME header we should also support zero space…
2018-02-12 10:28:11 +09:00
Chi Ngan Lee b2f8a3b4da Add Elixir 1.6 formatter config file and formatted the codebase 2018-02-09 11:42:08 +08:00
Chi Ngan Lee 87958400a1 When we regex-check the MIME header we should also support zero space between the type and the charset, say "text/html;charset=utf-8". 2018-02-09 11:22:17 +08:00
keepcosmos 307152202b update to 0.9.1 2017-11-09 19:40:34 +09:00
keepcosmos 2d4827a4f5 update mix.lock 2017-11-09 19:37:33 +09:00
Jaehyun Shin 221deea4f0
Merge pull request #32 from adlan/floki-latest
Upgrade to floki 0.18.0
2017-11-09 19:36:46 +09:00
Adlan Razalan 0b8a238250 Update minimum Elixir version requirement to 1.3.0 2017-11-03 23:40:18 +08:00
Adlan Razalan 9e43c454e7 Manually compare tag type for candidate
The match? method is no longer available starting Floki 0.15.0.
2017-11-03 23:40:18 +08:00
Adlan Razalan d409c3f74d Upgrade floki to 0.18.0 2017-11-03 23:40:18 +08:00
Jaehyun Shin 1a3928a6e4
Merge pull request #33 from adlan/case-insensitive-header-check
Do a case-insensitive content-type check
2017-11-01 11:38:39 +09:00
Adlan Razalan 49d21b71dc Do a case-insensitive content-type check 2017-10-29 15:09:00 +08:00
Jaehyun Shin 389483bae6 Update README.md 2017-09-27 15:32:28 +09:00
keepcosmos e2c5a4beed support elixir 1.5.1 2017-08-23 14:38:08 +09:00
keepcosmos aca14e3aef Merge branch 'master' of https://github.com/keepcosmos/readability 2017-08-23 14:10:11 +09:00
keepcosmos 3286a11211 upgrade dependencies 2017-08-23 14:09:37 +09:00
Jaehyun Shin 89d3958fd7 Merge pull request #25 from OldhamMade/master
Handle text-based responses
2017-08-23 14:08:33 +09:00
Phillip Oldham 2b53a90f3d added ability to handle text-based responses
added fix for content-type with charset

updated function names to match elixir naming conventions (is_ vs ?)

minor version bump

added default content-type of text/plain when header is missing
2017-08-21 21:54:30 +01:00
Jaehyun Shin 3b4ca84961 Merge pull request #26 from janza/master
Make sure title is set if h_tag_title() is empty
2017-06-13 15:47:52 +09:00
Josip Janzic 59b539ef43
Make sure title is set if h_tag_title is empty 2017-06-11 22:13:22 +02:00
keepcosmos 93955d36d2 Merge branch 'master' of https://github.com/keepcosmos/readability 2017-03-04 11:56:43 +09:00
keepcosmos 00734849da update version 2017-03-04 11:56:35 +09:00
Jaehyun Shin 47a00b01c7 Merge pull request #24 from simonbowen/master
Bumped Floki dependency to 0.14.0
2017-03-04 11:55:07 +09:00
Simon Bowen 809f64927a Bumped Floki dependency to 0.14.0 2017-02-21 23:04:35 +00:00
keepcosmos 2b7c1f7429 update readme for v0.7 2017-02-05 19:13:12 +09:00
keepcosmos f57b85d2fa add deps status badge 2017-02-05 19:08:06 +09:00
keepcosmos 1aa682a31a fix some bug and update deps 2017-02-05 18:48:26 +09:00
keepcosmos 47af5e48de Merge branch 'master' of https://github.com/keepcosmos/readability 2017-01-27 13:13:19 +09:00
Jaehyun Shin 1ea6f138ba Merge pull request #21 from pineconellc/fix_multiple_title_tags
Fix multi-match and no-match Title extractor issues
2016-11-24 19:04:29 +09:00
Jeff Browning ee09163204 Update CHANGELOG 2016-11-22 11:17:41 -05:00
Jeff Browning 760e1f03bc Fix merging of title tag matches and raising on no matches
Fixes #19, fixes #20
2016-11-22 11:14:23 -05:00
keepcosmos 91151c0556 update to 0.5.2 2016-11-22 13:50:55 +09:00
Jaehyun Shin 203efa28d4 Merge pull request #18 from pineconellc/fix_title_tag_finder
Scope the title tag selector to the head element
2016-11-22 13:49:18 +09:00
Jeff Browning 91dcb1e285 Scope the title tag selector to the head element 2016-11-14 17:54:12 -05:00
keepcosmos 4936a3f625 Merge branch 'master' of https://github.com/keepcosmos/readability 2016-11-07 14:05:02 +09:00
keepcosmos 821ab0e095 httpoison default option error 2016-11-07 14:04:31 +09:00
Jaehyun Shin 15394355fd Merge pull request #16 from namjae/master
reasonable default for httpoison_options
2016-11-07 14:02:22 +09:00
DalHo Park e799dc4a18 reasonable default for httpoison_options 2016-11-07 13:20:39 +09:00
keepcosmos 6cd3196aff release 0.6.0 2016-11-06 15:55:29 +09:00
Jaehyun Shin 688933ebd2 Merge pull request #15 from pineconellc/update_travis_ex_versions
Travis: update Elixir 1.2.x, add 1.3.x
2016-11-05 15:57:20 +09:00