Migration problems (large database)
-
I have a large database of stats (1.7 million rows in _koko_analytics_post_stats), and the migration times out. Is it possible to run the migration from command line somehow to avoid the timeout? I’d like to keep all my existing stats, while getting the new treats – this site has lots of taxonomy page traffic and is the main reason I’ve been asking for this feature.
-
Hi @msaari,
That’s a lot of stats! The migration is built in such a way that it should continue where it left off, but in your case it may indeed take a few pageloads for it to be done.
I’ll work on an updated version where you can initiate the migration from WP CLI, would that be a good option for you?I started using Koko Analytics in April 2022, and have kept all the stats. This is a book review site, and I use the stats to set weights in Relevanssi searches, so that the most popular books come up higher in search results and related posts.
Ok, good to know it can be done in batches. However, I did notice there were duplicate entries in the paths table: many paths were added twice, with consecutive ID numbers. That got me worried there might be some difficulties there.
WP CLI migration tool would be excellent.
Thanks! Once this is working, I’ll definitely get the Pro version just to say thanks.
Hello @msaari,
Can you let me know how the following version works for you please?
https://dannyvankooten-3.stack.storage/s/7aNRLq2PNV5fRd5b
The CLI Command is wp koko-analytics migrate_post_stats_to_v2 and optionally wp koko-analytics migrate_referrer_stats_to_v2 .The migration is really proving harder than expected, especially for large sites, but we’ll get there eventually!
PS. needless to say in your case I guess, but please keep a recent back-up of your 1.8.x Koko Analytics database tables ready.Nope, doesn’t work. My first problem is that installing the new version did not update the database tables (that may be a problem in my end; I had the same problem when downgrading to 1.8.6, I had to manually create the database tables). When I fixed the table structure to match 2.0.6 on my other site, it still doesn’t work: it can’t change the PRIMARY key (because there are duplicate 0 values in the
path_idcolumn) and there’s nothing in the _paths table.I set all
path_idvalues to NULL instead of 0, and the process started running. But it still doesn’t work, as every path gets the ID value 0 instead of the correct ID.Ah, of course: the ID column should be auto-increment. My bad, didn’t realize that. Now it’s running better.
@msaari Phew – happy to hear it. Was breaking my head over why the insert ID wasn’t coming through properly. Still a little baffled why the paths table didn’t create automatically, but at least things are running now.
As a side note: the migration should switch the primary key from (date, post_id) to (date, path_id) at the far end of the migration (once every entry in the stats table has a non-NULL path_id column value).
-
This reply was modified 9 months, 3 weeks ago by
Danny van Kooten.
By the way, @msaari – here is the full SQL for creating the paths table. The index is important otherwise a lot of unnecessary work will be performed whenever trying to insert/update stats.
CREATE TABLE IF NOT EXISTS wp_koko_analytics_paths (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
path VARCHAR(2000) NOT NULL,
INDEX (path(191))
) ENGINE=INNODB CHARACTER SET=utf8mb4Or, to create the index retro-actively:
CREATE INDEX idx_koko_analytics_paths_path ON wp_koko_analytics_paths(path(191));Yeah, I created all the indices, but I had to remove lots of duplicate entries manually before I was able to create them.
It’s been running for 15 minutes, and there are still 1.1 million rows to go. The WP CLI migration tool is very useful!
Hi @msaari,
That’s…. slow! It shouldn’t be that slow in theory. Do you have a lot of entries in your post_stats table for posts that have been deleted since then?
If so, could you give this version a go? https://dannyvankooten-3.stack.storage/s/eIB2Ksmr7dRD6GM6I don’t think so – at least I haven’t deleted a huge number of posts.
Anyway, I left the process running when I went for a lunch, and when I came back, it was complete.
Looking at the
wp_koko_analytics_post_statstable, it only has 950k rows now, so we’ve lost 800k rows in the process. That’s… interesting? I don’t know what has happened there. Something’s funny there. I’m not sure I’m interested in digging what’s gone – a lot of it is still there, and this really isn’t a hill worth dying on 😉By the way, the stats page columns don’t have room for six figure numbers =) If I look at stats from this year, I have over 300k+ hits from Google, and those numbers don’t fit in the columns. If I set the
.w-fitclasswidthto60pxeverything looks nice.Hi @msaari,
Happy to hear it!
Looking at the
wp_koko_analytics_post_statstable, it only has 950k rows now, so we’ve lost 800k rows in the process. That’s… interesting? I don’t know what has happened there. Something’s funny there. I’m not sure I’m interested in digging what’s gone – a lot of it is still there, and this really isn’t a hill worth dying on 😉Interesting. My first thought would be any deleted posts but that’s a lot of deleted posts and seems unlikely. I had to make a design choice for deleted post, either come up with some random path like “/deleted/” or delete them from the post stats table altogether since there’s probably a reason they were deleted…
EDIT: Sorry, I now see that you mentioned you didn’t really delete any posts. Ignore above paragraph. I’ll do some more digging to see what could have caused that.By the way, the stats page columns don’t have room for six figure numbers =) If I look at stats from this year, I have over 300k+ hits from Google, and those numbers don’t fit in the columns. If I set the
.w-fitclasswidthto60pxeverything looks nice.Noted! I’m trying to get the column as small as possible through
fit-contentin CSS, but obviously in the context of tables that won’t work by just looking at the table header cell… Will improve!
Are new stats coming in correctly and mapping to the correct path for you?-
This reply was modified 9 months, 3 weeks ago by
Danny van Kooten.
Everything seems to be ok. We have over 11k posts, but the path table only has 6,400 rows for individual posts. It’s unlikely that we have over 5k posts without any hits, but when I visited a path that didn’t exist in the table, it got recorded correctly, so that’s working. Some data has clearly gone missing, but I’m not going to stress over that – the most popular posts lists are looking good, so I’m happy. I should probably reduce the amount of data in any case.
Also, I see taxonomy archives getting hits. There are also lots of paths generated for taxonomy term archives without any pageviews in the stats table. Is that normal?
Also, I see taxonomy archives getting hits. There are also lots of paths generated for taxonomy term archives without any pageviews in the stats table. Is that normal?
They should be lazily generated as soon as they receive a hit (but not any sooner). So that does sound a little strange. If they are in the paths table, there should be an entry for it in the
post_statstable too…Well, they’re definitely not appearing in
post_stats! For path IDs 7000+, there’s 7000, 7001, 7002, 7006, 7007, 7014, 7022, and 7023 inpost_stats, but the rest don’t make an appearance.Also, I see the same path ID reference two different post IDs… 6726 is both post ID 96677 and 128152. The latter is the correct post, and for the first post, the correct path ID is 6728 (which has no entry in
post_stats). When I visit that page again, the hits are recorded based on thepost_id, not thepath_id. The stats display is based on thepath_id, so the wrong post appears in the stats dashboard. -
This reply was modified 9 months, 3 weeks ago by
The topic ‘Migration problems (large database)’ is closed to new replies.