Scan old posts?
-
Would it be possible to add a feature to scan posts from before one installed the plugin so that images in older posts can be archived locally as well?
-
Yes, that’s a great idea, I’m on it, should take a while.
兩個小時內就有了。呢麼厲害!
The way it is currently implemented for scanning old posts is a problem for blogs with a large number of posts. I have 2,548 posts on my site (it is a group blog and we’ve been blogging for a long time).
It seems to stop after a bit. It did 830 the first time then stopped. The second time it stopped at 1168. The third time at 1647… I still haven’t gotten to the end.
Each time it starts at the beginning again, but it seems to go very fast through the posts that have already been processed.
With the latest update (1.5.1) it seems to scan everything, but it isn’t getting some images. For instance, I used to host a lot of images on Evernote using Skitch. These images start with the URL:
<img src="https://www.evernote.com/shard...If I click on the manual “save remote images” button in a post it will catch these and move them, but when doing the bulk scan it doesn’t seem to process these posts.I think I was wrong to say that 1.5.1 was scanning everything. I have realized that the post ID is not sequential, so even though the ID number exceeds the number of posts i have, there are still many hundreds of posts that haven’t been scanned. This is true even though I am now running 1.5.2.
I suggest breaking up the scanning into batches and having the person manually do the next batch (of 300 or 500?) after the first batch is done.
Copy that, Thanks for reply.
Hi, please update v1.5.3.
I think it may be because PHP runs for a limited time, so breaking up the scanning into batches can not resolve this problem.
now you can set a range to scan manually.
Maybe someday i can find out a smarter way, I hope.
I found a smarter way to resolved this problem. using AJAX to scan posts one by one.but it runs slower than 1.5.3. please update v1.5.4.
Unfortunately this didn’t finish, it stopped after 1448 entries. But it was much more reliable than the old method. For some reason, however, I cannot access the batch function anymore (although it is still there). If I could combine the AJAX with the batch, hopefully I could go through all my images…
Please help me to test the newest version 1.5.5.
Version 1.5.5 stops after 1445 entries at speed set to “5” and at 1448 entries with speed set at “1.” I didn’t try any other speeds.
Ummm..I suggest try to set start from 1400 to scan 1000 posts. if stops after 45/48 entries, means maybe some posts are special.
And i will find out a way how to debug. and today is the Chinese National Day, i have a whole week vacation, so many things to do, so please be patient.
Thanks for help me to make this plugin powerful.
1. No rush.
2. I didn’t realize I could still set a start number, because the option is greyed out, but now I see you can do it if you first select how many posts you wish to scan. So I did what you suggested.
3. You are right, post #1448 is causing problems. Starting at 1449 allowed it to start again, but then it only did two. #1450 seems to be a problem post as well.
4. Looking at these two posts it is very easy to see what the problem is. The authors of these posts must have written their posts in MS Word and then pasted them into WordPress, creating lots of extra DIV tags, like this:<blockquote><span style="font-size: 12pt; line-height: 115%; font-family: "Times New Roman","serif";"> ... </span> ... <em>...</em><span>...</span>. <div style="text-align: left;"><span style="font-size: 12pt; line-height: 115%; font-family: "Times New Roman","serif"; color: black;">...</span> <span style="font-size: 12pt; line-height: 115%; font-family: "Times New Roman","serif"; color: black;"> </span> <span style="font-size: 12pt; line-height: 115%; font-family: "Times New Roman","serif"; color: black;"> ...<span> </span>...</span></div></blockquote>It seems that QQWorld is choking on this ugly code…
Here is the first post:
http://savageminds.org/2010/01/20/place-hacking/
And here is the second:
http://savageminds.org/2010/01/28/concerned-anthropologists-letter-to-washington/
This one also seems to have caused problems, although I’m less clear why. (The use of a URL in the caption field?)
http://savageminds.org/2011/07/18/the-anthropology-of-freedom-pt-4/
The topic ‘Scan old posts?’ is closed to new replies.