33 Commits

Author SHA1 Message Date
Ben Olive
5dd52d5698 Ensure remove_tag returns a valid html_tree
If the entire input is stripped out, this used to return `nil` which
caused downstream parsing to fail. Instead, return `[]` which is the
Floki representation of an empty tree.

Fixes #36
2018-10-11 10:35:16 +09:00
Ben Olive
b35746bfed Strip out atom tags
Standard tags are returned by Mochiweb as binaries. The atom tags are
for special case parsing (such as php includes). Since that's not oging
to be part of the article, simply exclude those while normalizing.

Fixes #30

See also:

Mochiweb parser: 9608d786ef/src/mochiweb_html.erl (L345)
2018-10-11 10:34:29 +09:00
keepcosmos
2ed20b6fe1 update deps and deprecated 2018-07-24 18:50:08 +09:00
Jaehyun Shin
133044f50c
Merge pull request #37 from fribmendes/frm/img-tags
Convert relative img paths into absolute
2018-07-24 18:00:43 +09:00
Simon Bowen
bbe8f6ad1a Made tests pass, Floki updated to allow encoding of special characters of entities, update readability to disable this. 2018-07-18 16:00:47 +01:00
Fernando Mendes
ebc8c90e71 Convert relative img paths into absolute
Fixes #27
2018-06-30 11:14:17 +01:00
Jaehyun Shin
4f2449558d
Merge pull request #35 from chingan90/feature/formatter
Add Elixir 1.6 formatter config file and formatted the codebase
2018-02-12 10:28:31 +09:00
Chi Ngan Lee
b2f8a3b4da Add Elixir 1.6 formatter config file and formatted the codebase 2018-02-09 11:42:08 +08:00
Chi Ngan Lee
87958400a1 When we regex-check the MIME header we should also support zero space between the type and the charset, say "text/html;charset=utf-8". 2018-02-09 11:22:17 +08:00
Adlan Razalan
9e43c454e7 Manually compare tag type for candidate
The match? method is no longer available starting Floki 0.15.0.
2017-11-03 23:40:18 +08:00
Adlan Razalan
49d21b71dc Do a case-insensitive content-type check 2017-10-29 15:09:00 +08:00
keepcosmos
e2c5a4beed support elixir 1.5.1 2017-08-23 14:38:08 +09:00
Jaehyun Shin
89d3958fd7 Merge pull request #25 from OldhamMade/master
Handle text-based responses
2017-08-23 14:08:33 +09:00
Phillip Oldham
2b53a90f3d added ability to handle text-based responses
added fix for content-type with charset

updated function names to match elixir naming conventions (is_ vs ?)

minor version bump

added default content-type of text/plain when header is missing
2017-08-21 21:54:30 +01:00
Josip Janzic
59b539ef43
Make sure title is set if h_tag_title is empty 2017-06-11 22:13:22 +02:00
keepcosmos
1aa682a31a fix some bug and update deps 2017-02-05 18:48:26 +09:00
Jeff Browning
760e1f03bc Fix merging of title tag matches and raising on no matches
Fixes #19, fixes #20
2016-11-22 11:14:23 -05:00
Jeff Browning
91dcb1e285 Scope the title tag selector to the head element 2016-11-14 17:54:12 -05:00
DalHo Park
e799dc4a18 reasonable default for httpoison_options 2016-11-07 13:20:39 +09:00
Jeff Browning
d3be3bdd82 Only split title suffix for tag titles 2016-11-04 14:51:24 -04:00
Jeff Browning
2f8e84eb8a Clean up and fix warnings 2016-11-04 14:49:42 -04:00
Jeff Browning
747e0495ed Fix detection of title suffix 2016-11-04 14:49:25 -04:00
Jeff Browning
a6acaa9c81 Allow HTTPoison options to be specified in config 2016-10-18 09:54:00 -04:00
Eason Goodale
6840a9d0d7 Fixes crash when html has an xml version tag by stripping it out 2016-08-13 22:11:01 -07:00
keepcosmos
93bdf48b8c add summarize function
this closes #4, closes #3
2016-05-07 18:28:39 +09:00
keepcosmos
2274b3d063 add authors extractor doc 2016-04-28 15:19:11 +09:00
keepcosmos
4aa8f6ecea add authors finder 2016-04-28 15:15:24 +09:00
keepcosmos
23db20bbf0 add document 2016-04-24 18:40:35 +09:00
keepcosmos
46ac9dddde add doc 2016-04-24 16:14:31 +09:00
keepcosmos
d8677a599c add test 2016-04-24 14:32:43 +09:00
keepcosmos
b131d7effa add candidate builder
add test
2016-04-23 12:31:03 +09:00
keepcosmos
4e4a712718 add filter algorithms 2016-04-17 15:28:33 +09:00
keepcosmos
d91604a519 initial commit 2016-04-15 20:51:29 +09:00