switch to html2text() instead of strip_tags() when preparing FTS index
This commit is contained in:
@@ -0,0 +1,5 @@
|
||||
A document without any HTML open/closing tags.
|
||||
---------------------------------------------------------------
|
||||
We try and use the representation given by common browsers of the HTML document, so that it looks similar when converted to plain text. visit foo.com - or http://www.foo.com link
|
||||
|
||||
An anchor which will not appear
|
||||
@@ -0,0 +1,5 @@
|
||||
A document without any HTML open/closing tags.
|
||||
---------------------------------------------------------------
|
||||
We try and use the representation given by common browsers of the HTML document, so that it looks similar when converted to plain text. [visit foo.com](http://foo.com) - or http://www.foo.com [link](http://foo.com)
|
||||
|
||||
[An anchor which will not appear]
|
||||
@@ -0,0 +1,15 @@
|
||||
Hello, World!
|
||||
|
||||
This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly.
|
||||
|
||||
Even mismatched tags.
|
||||
|
||||
A div
|
||||
Another div
|
||||
A div
|
||||
within a div
|
||||
|
||||
Another line
|
||||
Yet another line
|
||||
|
||||
A link
|
||||
@@ -0,0 +1,15 @@
|
||||
Hello, World!
|
||||
|
||||
This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly.
|
||||
|
||||
Even mismatched tags.
|
||||
|
||||
A div
|
||||
Another div
|
||||
A div
|
||||
within a div
|
||||
|
||||
Another line
|
||||
Yet another line
|
||||
|
||||
[A link](http://foo.com)
|
||||
@@ -0,0 +1,44 @@
|
||||
Hello
|
||||
|
||||
> Nest some block quotes with preformated text
|
||||
>
|
||||
>> Here is the code
|
||||
>>
|
||||
>> #include <stdlib.h>
|
||||
>> #include <stdio.h>
|
||||
>>
|
||||
>> int main(){
|
||||
>> return 0;
|
||||
>> };
|
||||
>>
|
||||
>> Put some tags at the end
|
||||
>
|
||||
> Some text and tags here
|
||||
>
|
||||
>> First line
|
||||
>>
|
||||
>> Header 1
|
||||
>>
|
||||
>> Some text
|
||||
>> ---------------------------------------------------------------
|
||||
>> Some more text
|
||||
>>
|
||||
>> Paragraph tag!
|
||||
>>
|
||||
>> Header 2
|
||||
>>
|
||||
>> ---------------------------------------------------------------
|
||||
>>
|
||||
>> Header 3
|
||||
>>
|
||||
>> Some text
|
||||
>>
|
||||
>> Header 4
|
||||
>>
|
||||
>>> More quoted text!
|
||||
>>
|
||||
>> Paragraph tag!
|
||||
>>
|
||||
>> Final line
|
||||
|
||||
Some ending text just to make sure
|
||||
@@ -0,0 +1 @@
|
||||
Hello
|
||||
@@ -0,0 +1,53 @@
|
||||
http://localhost/home 16 December 2015
|
||||
Account 123
|
||||
|
||||
Hi Susan
|
||||
|
||||
Here is your cat report.
|
||||
|
||||
You have found 5 cats less than anyone else
|
||||
[Find more cats](http://localhost/cats)
|
||||
|
||||
Down the road
|
||||
|
||||
Across the hall
|
||||
|
||||
Your achievements
|
||||
|
||||
You're currently finding about
|
||||
12 cats
|
||||
per day
|
||||
|
||||
[Number of cats found]
|
||||
---------------------------------------------------------------
|
||||
|
||||
Your last cat was found two days ago.
|
||||
|
||||
One type of cat is a kitten.
|
||||
|
||||
Special account A1
|
||||
|
||||
12.345
|
||||
|
||||
http://localhost/logout
|
||||
|
||||
How can you find more cats?
|
||||
|
||||
Look in trash cans
|
||||
|
||||
Start meowing
|
||||
|
||||
Eat cat food
|
||||
|
||||
Some cats like to hang out in trash cans. Some cats do not. Some cats are attracted to similar tones. So one day your tears may smell like cat food, attracting more cats.
|
||||
https://localhost/about https://localhost/about https://localhost/about
|
||||
[Cats are great.](https://github.com/soundasleep/html2text_ruby) [Find more cats.](https://github.com/soundasleep/html2text_ruby) [Do more things.](https://github.com/soundasleep/html2text_ruby)
|
||||
|
||||
[Contact us](http://localhost/contact)
|
||||
|
||||
cats@cats.com
|
||||
Monday and Friday
|
||||
|
||||
https://github.com/soundasleep/html2text https://github.com/soundasleep/html2text_ruby
|
||||
|
||||
Having trouble seeing this email? [View it online](http://localhost/view_it_online).
|
||||
+25872
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,27 @@
|
||||
One:
|
||||
|
||||
Two: [two]
|
||||
|
||||
Three: [three]
|
||||
|
||||
Four: [four]
|
||||
|
||||
With links
|
||||
|
||||
One: http://localhost
|
||||
|
||||
Two: [two](http://localhost)
|
||||
|
||||
Three: [three](http://localhost)
|
||||
|
||||
Four: [four](http://localhost)
|
||||
|
||||
With links with titles
|
||||
|
||||
One: [one link](http://localhost)
|
||||
|
||||
Two: [two link](http://localhost)
|
||||
|
||||
Three: [three link](http://localhost)
|
||||
|
||||
Four: [four link](http://localhost)
|
||||
@@ -0,0 +1 @@
|
||||
Hello &nbsnbsp; world
|
||||
@@ -0,0 +1,17 @@
|
||||
List tests
|
||||
|
||||
Add some lists.
|
||||
|
||||
- one
|
||||
- two
|
||||
- three
|
||||
|
||||
An unordered list
|
||||
|
||||
- one
|
||||
- two
|
||||
- three
|
||||
|
||||
- one
|
||||
- two
|
||||
- three
|
||||
@@ -0,0 +1,7 @@
|
||||
Anchor tests
|
||||
|
||||
Visit http://openiaml.org or openiaml.org or http://openiaml.org.
|
||||
|
||||
To visit with SSL, visit https://openiaml.org or openiaml.org or https://openiaml.org.
|
||||
|
||||
To mail, email support@openiaml.org or mailto:support@openiaml.org or support@openiaml.org or mailto:support@openiaml.org.
|
||||
@@ -0,0 +1,12 @@
|
||||
Dear html2text,
|
||||
|
||||
This is an example email that can be used to test html2text conversion of outlook / exchange emails.
|
||||
|
||||
The addition of <o:p> tags is very annoying!
|
||||
This is a single line return
|
||||
|
||||
This is bold
|
||||
This is italic
|
||||
This is underline
|
||||
|
||||
Andrew
|
||||
@@ -0,0 +1 @@
|
||||
hello world & people < > &NBSP;
|
||||
@@ -0,0 +1,12 @@
|
||||
Just two divs
|
||||
Hanging out
|
||||
Nested divs and line breaks
|
||||
|
||||
Nested divs and line breaks
|
||||
More text
|
||||
|
||||
Just text
|
||||
Just text
|
||||
Just text
|
||||
|
||||
This is the end!
|
||||
@@ -0,0 +1,35 @@
|
||||
Hello
|
||||
How are you?
|
||||
|
||||
How are you?
|
||||
|
||||
How are you?
|
||||
|
||||
Just two divs
|
||||
Hanging out
|
||||
This is not the end!
|
||||
How are you again?
|
||||
This is the end!
|
||||
Just kidding
|
||||
|
||||
Header 1
|
||||
|
||||
Some text
|
||||
---------------------------------------------------------------
|
||||
Some more text
|
||||
|
||||
Paragraph tag!
|
||||
|
||||
Header 2
|
||||
|
||||
---------------------------------------------------------------
|
||||
|
||||
Header 3
|
||||
|
||||
Some text
|
||||
|
||||
Header 4
|
||||
|
||||
Paragraph tag!
|
||||
|
||||
Final line
|
||||
@@ -0,0 +1 @@
|
||||
these spaces are non-breaking
|
||||
@@ -0,0 +1,8 @@
|
||||
Here is the code
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
|
||||
int main(){
|
||||
return 0;
|
||||
};
|
||||
@@ -0,0 +1,7 @@
|
||||
Hello, World!
|
||||
|
||||
Col A Col B
|
||||
Data A1 Data B1
|
||||
Data A2 Data B2
|
||||
Data A3 Data B4
|
||||
Total A Total B
|
||||
@@ -0,0 +1,2 @@
|
||||
test one
|
||||
test two
|
||||
@@ -0,0 +1,5 @@
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
5 < 6
|
||||
@@ -0,0 +1,2 @@
|
||||
- ÅÄÖ
|
||||
- åäö
|
||||
@@ -0,0 +1,2 @@
|
||||
- ÅÄÖ
|
||||
- åäö
|
||||
@@ -0,0 +1 @@
|
||||
foobar
|
||||
Reference in New Issue
Block a user