• Resolved Mikko Saari

    (@msaari)


    I have a large database of stats (1.7 million rows in _koko_analytics_post_stats), and the migration times out. Is it possible to run the migration from command line somehow to avoid the timeout? I’d like to keep all my existing stats, while getting the new treats – this site has lots of taxonomy page traffic and is the main reason I’ve been asking for this feature.

Viewing 15 replies - 1 through 15 (of 18 total)
  • Plugin Author Danny van Kooten

    (@dvankooten)

    Hi @msaari,

    That’s a lot of stats! The migration is built in such a way that it should continue where it left off, but in your case it may indeed take a few pageloads for it to be done.

    I’ll work on an updated version where you can initiate the migration from WP CLI, would that be a good option for you?

    Thread Starter Mikko Saari

    (@msaari)

    I started using Koko Analytics in April 2022, and have kept all the stats. This is a book review site, and I use the stats to set weights in Relevanssi searches, so that the most popular books come up higher in search results and related posts.

    Ok, good to know it can be done in batches. However, I did notice there were duplicate entries in the paths table: many paths were added twice, with consecutive ID numbers. That got me worried there might be some difficulties there.

    WP CLI migration tool would be excellent.

    Thanks! Once this is working, I’ll definitely get the Pro version just to say thanks.

    Plugin Author Danny van Kooten

    (@dvankooten)

    Hello @msaari,

    Can you let me know how the following version works for you please?

    https://dannyvankooten-3.stack.storage/s/7aNRLq2PNV5fRd5b

    The CLI Command is wp koko-analytics migrate_post_stats_to_v2 and optionally wp koko-analytics migrate_referrer_stats_to_v2 .

    The migration is really proving harder than expected, especially for large sites, but we’ll get there eventually!

    PS. needless to say in your case I guess, but please keep a recent back-up of your 1.8.x Koko Analytics database tables ready.

    Thread Starter Mikko Saari

    (@msaari)

    Nope, doesn’t work. My first problem is that installing the new version did not update the database tables (that may be a problem in my end; I had the same problem when downgrading to 1.8.6, I had to manually create the database tables). When I fixed the table structure to match 2.0.6 on my other site, it still doesn’t work: it can’t change the PRIMARY key (because there are duplicate 0 values in the path_id column) and there’s nothing in the _paths table.

    Thread Starter Mikko Saari

    (@msaari)

    I set all path_id values to NULL instead of 0, and the process started running. But it still doesn’t work, as every path gets the ID value 0 instead of the correct ID.

    Thread Starter Mikko Saari

    (@msaari)

    Ah, of course: the ID column should be auto-increment. My bad, didn’t realize that. Now it’s running better.

    Plugin Author Danny van Kooten

    (@dvankooten)

    @msaari Phew – happy to hear it. Was breaking my head over why the insert ID wasn’t coming through properly. Still a little baffled why the paths table didn’t create automatically, but at least things are running now.

    As a side note: the migration should switch the primary key from (date, post_id) to (date, path_id) at the far end of the migration (once every entry in the stats table has a non-NULL path_id column value).

    Plugin Author Danny van Kooten

    (@dvankooten)

    By the way, @msaari – here is the full SQL for creating the paths table. The index is important otherwise a lot of unnecessary work will be performed whenever trying to insert/update stats.

    CREATE TABLE IF NOT EXISTS wp_koko_analytics_paths (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    path VARCHAR(2000) NOT NULL,
    INDEX (path(191))
    ) ENGINE=INNODB CHARACTER SET=utf8mb4

    Or, to create the index retro-actively:

    CREATE INDEX idx_koko_analytics_paths_path ON wp_koko_analytics_paths(path(191)); 
    Thread Starter Mikko Saari

    (@msaari)

    Yeah, I created all the indices, but I had to remove lots of duplicate entries manually before I was able to create them.

    It’s been running for 15 minutes, and there are still 1.1 million rows to go. The WP CLI migration tool is very useful!

    Plugin Author Danny van Kooten

    (@dvankooten)

    Hi @msaari,

    That’s…. slow! It shouldn’t be that slow in theory. Do you have a lot of entries in your post_stats table for posts that have been deleted since then?

    If so, could you give this version a go? https://dannyvankooten-3.stack.storage/s/eIB2Ksmr7dRD6GM6

    Thread Starter Mikko Saari

    (@msaari)

    I don’t think so – at least I haven’t deleted a huge number of posts.

    Anyway, I left the process running when I went for a lunch, and when I came back, it was complete.

    Looking at the wp_koko_analytics_post_stats table, it only has 950k rows now, so we’ve lost 800k rows in the process. That’s… interesting? I don’t know what has happened there. Something’s funny there. I’m not sure I’m interested in digging what’s gone – a lot of it is still there, and this really isn’t a hill worth dying on 😉

    By the way, the stats page columns don’t have room for six figure numbers =) If I look at stats from this year, I have over 300k+ hits from Google, and those numbers don’t fit in the columns. If I set the .w-fit class width to 60px everything looks nice.

    Plugin Author Danny van Kooten

    (@dvankooten)

    Hi @msaari,

    Happy to hear it!

    Looking at the wp_koko_analytics_post_stats table, it only has 950k rows now, so we’ve lost 800k rows in the process. That’s… interesting? I don’t know what has happened there. Something’s funny there. I’m not sure I’m interested in digging what’s gone – a lot of it is still there, and this really isn’t a hill worth dying on 😉

    Interesting. My first thought would be any deleted posts but that’s a lot of deleted posts and seems unlikely. I had to make a design choice for deleted post, either come up with some random path like “/deleted/” or delete them from the post stats table altogether since there’s probably a reason they were deleted…

    EDIT: Sorry, I now see that you mentioned you didn’t really delete any posts. Ignore above paragraph. I’ll do some more digging to see what could have caused that.

    By the way, the stats page columns don’t have room for six figure numbers =) If I look at stats from this year, I have over 300k+ hits from Google, and those numbers don’t fit in the columns. If I set the .w-fit class width to 60px everything looks nice.

    Noted! I’m trying to get the column as small as possible through fit-content in CSS, but obviously in the context of tables that won’t work by just looking at the table header cell… Will improve!

    Are new stats coming in correctly and mapping to the correct path for you?

    Thread Starter Mikko Saari

    (@msaari)

    Everything seems to be ok. We have over 11k posts, but the path table only has 6,400 rows for individual posts. It’s unlikely that we have over 5k posts without any hits, but when I visited a path that didn’t exist in the table, it got recorded correctly, so that’s working. Some data has clearly gone missing, but I’m not going to stress over that – the most popular posts lists are looking good, so I’m happy. I should probably reduce the amount of data in any case.

    Also, I see taxonomy archives getting hits. There are also lots of paths generated for taxonomy term archives without any pageviews in the stats table. Is that normal?

    Plugin Author Danny van Kooten

    (@dvankooten)

    Also, I see taxonomy archives getting hits. There are also lots of paths generated for taxonomy term archives without any pageviews in the stats table. Is that normal?

    They should be lazily generated as soon as they receive a hit (but not any sooner). So that does sound a little strange. If they are in the paths table, there should be an entry for it in the post_stats table too…

    Thread Starter Mikko Saari

    (@msaari)

    Well, they’re definitely not appearing in post_stats! For path IDs 7000+, there’s 7000, 7001, 7002, 7006, 7007, 7014, 7022, and 7023 in post_stats, but the rest don’t make an appearance.

    Also, I see the same path ID reference two different post IDs… 6726 is both post ID 96677 and 128152. The latter is the correct post, and for the first post, the correct path ID is 6728 (which has no entry in post_stats). When I visit that page again, the hits are recorded based on the post_id , not the path_id. The stats display is based on the path_id, so the wrong post appears in the stats dashboard.

Viewing 15 replies - 1 through 15 (of 18 total)

The topic ‘Migration problems (large database)’ is closed to new replies.