I’ve got the same problem …
@uhi888
please post more dateils;
as your problem is probably not exactly the same, please start your own topic and include a link to your site.
Me too. I’m using WP Scraper in a widget with the following shortcode:
[wpws url="http://bbc.co.uk" selector="#business_marketData_items" user_agent="Bot at capman-group.com" on_error="error_show" output="html" striptags="<a>"]
The only HTML tags left in with output=”html” are <span>’s, all the others a stripped out. If I change it to output=”text” it strips out the <span>’s too.
Debug info below. My site isn’t live yet so I have nothing to show (I’m developing locally on my machine).
<!--
Start of web scrap (created by wp-web-scraper)
Source URL: http://bbc.co.uk
Selector: #business_marketData_items
Xpath:
Delivered thru: Cache
WPWS options: Array
(
[postargs] =>
[cache] => 60
[user_agent] => Bot at capman-group.com
[timeout] => 2
[on_error] => error_show
[output] => html
[clear_regex] =>
[clear_selector] =>
[replace_regex] =>
[replace_selector] =>
[replace_with] =>
[replace_selector_with] =>
[basehref] =>
[striptags] => <a>
[removetags] =>
[callback] =>
[debug] => 1
[htmldecode] =>
[urldecode] => 1
[xpathdecode] =>
[request_mt] => 1353787384.3393
)
-->
I’ve figured out the “issue”… I was trying to scrape a table by using the css id of the the <table> tag. The scraper pulls out all the html below this, but not the <table> tag itself. This meant when I looked in Chrome’s Inspect Element console it showed no html, as the html was badly formed and couldn’t be parsed. I saw that it all looked ok when I looked at the source (Ctrl-U in Chrome).
To fix it I used a callback function to replace the <table> tags:
function mymodule_add_table_tags ($scrap) {
return '<table>'.$scrap.'</table>';
}
And change the shortcode to:
[wpws url="http://bbc.co.uk" selector="#business_marketData_items" user_agent="Bot at capman-group.com" on_error="error_show" output="html" striptags="<a>" callback="mymodule_add_table_tags "]