Ben Olive
5dd52d5698
Ensure `remove_tag` returns a valid html_tree
...
If the entire input is stripped out, this used to return `nil` which
caused downstream parsing to fail. Instead, return `[]` which is the
Floki representation of an empty tree.
Fixes #36
2018-10-11 10:35:16 +09:00
Ben Olive
b35746bfed
Strip out atom tags
...
Standard tags are returned by Mochiweb as binaries. The atom tags are
for special case parsing (such as php includes). Since that's not oging
to be part of the article, simply exclude those while normalizing.
Fixes #30
See also:
Mochiweb parser: 9608d786ef/src/mochiweb_html.erl (L345)
2018-10-11 10:34:29 +09:00
keepcosmos
2ed20b6fe1
update deps and deprecated
2018-07-24 18:50:08 +09:00
Jaehyun Shin
133044f50c
Merge pull request #37 from fribmendes/frm/img-tags
...
Convert relative img paths into absolute
2018-07-24 18:00:43 +09:00
Simon Bowen
bbe8f6ad1a
Made tests pass, Floki updated to allow encoding of special characters of entities, update readability to disable this.
2018-07-18 16:00:47 +01:00
Fernando Mendes
ebc8c90e71
Convert relative img paths into absolute
...
Fixes #27
2018-06-30 11:14:17 +01:00
Jaehyun Shin
4f2449558d
Merge pull request #35 from chingan90/feature/formatter
...
Add Elixir 1.6 formatter config file and formatted the codebase
2018-02-12 10:28:31 +09:00
Chi Ngan Lee
b2f8a3b4da
Add Elixir 1.6 formatter config file and formatted the codebase
2018-02-09 11:42:08 +08:00
Chi Ngan Lee
87958400a1
When we regex-check the MIME header we should also support zero space between the type and the charset, say "text/html;charset=utf-8".
2018-02-09 11:22:17 +08:00
Adlan Razalan
9e43c454e7
Manually compare tag type for candidate
...
The match? method is no longer available starting Floki 0.15.0.
2017-11-03 23:40:18 +08:00
Adlan Razalan
49d21b71dc
Do a case-insensitive content-type check
2017-10-29 15:09:00 +08:00
keepcosmos
e2c5a4beed
support elixir 1.5.1
2017-08-23 14:38:08 +09:00
Jaehyun Shin
89d3958fd7
Merge pull request #25 from OldhamMade/master
...
Handle text-based responses
2017-08-23 14:08:33 +09:00
Phillip Oldham
2b53a90f3d
added ability to handle text-based responses
...
added fix for content-type with charset
updated function names to match elixir naming conventions (is_ vs ?)
minor version bump
added default content-type of text/plain when header is missing
2017-08-21 21:54:30 +01:00
Josip Janzic
59b539ef43
Make sure title is set if h_tag_title is empty
2017-06-11 22:13:22 +02:00
keepcosmos
1aa682a31a
fix some bug and update deps
2017-02-05 18:48:26 +09:00
Jeff Browning
760e1f03bc
Fix merging of title tag matches and raising on no matches
...
Fixes #19 , fixes #20
2016-11-22 11:14:23 -05:00
Jeff Browning
91dcb1e285
Scope the title tag selector to the head element
2016-11-14 17:54:12 -05:00
DalHo Park
e799dc4a18
reasonable default for httpoison_options
2016-11-07 13:20:39 +09:00
Jeff Browning
d3be3bdd82
Only split title suffix for tag titles
2016-11-04 14:51:24 -04:00
Jeff Browning
2f8e84eb8a
Clean up and fix warnings
2016-11-04 14:49:42 -04:00
Jeff Browning
747e0495ed
Fix detection of title suffix
2016-11-04 14:49:25 -04:00
Jeff Browning
a6acaa9c81
Allow HTTPoison options to be specified in config
2016-10-18 09:54:00 -04:00
Eason Goodale
6840a9d0d7
Fixes crash when html has an xml version tag by stripping it out
2016-08-13 22:11:01 -07:00
keepcosmos
93bdf48b8c
add summarize function
...
this closes #4 , closes #3
2016-05-07 18:28:39 +09:00
keepcosmos
2274b3d063
add authors extractor doc
2016-04-28 15:19:11 +09:00
keepcosmos
4aa8f6ecea
add authors finder
2016-04-28 15:15:24 +09:00
keepcosmos
23db20bbf0
add document
2016-04-24 18:40:35 +09:00
keepcosmos
46ac9dddde
add doc
2016-04-24 16:14:31 +09:00
keepcosmos
d8677a599c
add test
2016-04-24 14:32:43 +09:00
keepcosmos
b131d7effa
add candidate builder
...
add test
2016-04-23 12:31:03 +09:00
keepcosmos
4e4a712718
add filter algorithms
2016-04-17 15:28:33 +09:00
keepcosmos
d91604a519
initial commit
2016-04-15 20:51:29 +09:00