• Resolved fabregas4

    (@fabregas4)


    Hi,

    Having dug into the ‘sampling rate’ and how it works, this is what I discovered. Would be great if the developers could confirm/deny/clarify this.

    When the product page is viewed, a random number is generated that is between 1 and the sampling rate set (let’s assume 20 for this example).

    If the random number is 1, it records stats against the product, to the effect of 20 views.

    If the random number is anything from 2 to 20, it doesn’t record any stats.

    So what this means is that technically, you could have 100 actual views of your post/product page with 0 views recorded in the WPP stats.

    It also means that 100 actual views could result in anywhere between 20 and 2000 views recorded in the WPP stats.

    This is obviously a very big variance. I assume that the idea is that at very high traffic numbers, this will all even itself out. The amount of randomness (or variance from actual views) will reduce over time.

    My conclusions from this are:

    – Probably best not to to use sampling rate, unless you’ve actually experienced performance impacts with WPP

    – Decision to use sampling rate shouldn’t really be related to the amount of whole-of-site traffic. A site may have high traffic overall, but it may have 10,000 posts/products, that each get only a moderate amount views. This can result in the stats/rankings being quite a way off from reality, especially if the way you use the stats or display the items in your site is at a low time frequency (e.g. most popular products in last day or week).

    Would love to hear any corrections on this, or differences of opinion.

    Cheers

Viewing 7 replies - 1 through 7 (of 7 total)
  • Plugin Author Hector Cabrera

    (@hcabrera)

    Hi @fabregas4,

    Have you seen this yet? Data Sampling. If not it should clarify some of the questions you may have.

    • This reply was modified 3 years, 8 months ago by Hector Cabrera.
    • This reply was modified 3 years, 8 months ago by Hector Cabrera. Reason: Fixed link
    Thread Starter fabregas4

    (@fabregas4)

    Hi @hcabrera

    Somehow I missed that, thanks. It confirms what I had found.

    Can you provide an example of applying the ‘Rule Of Three’?

    It sounds like overall site traffic shouldn’t be the only variable determining sample rate. It makes sense in terms of maintaining performance (although server specs are also a huge variable), but not in terms of maintaining some kind of accuracy in the stats.

    For this, we should also take into account the total number of posts and how many views each post is getting. This can get rather complex though, so I can see why overall site traffic is used as the guide.

    Cheers

    Plugin Author Hector Cabrera

    (@hcabrera)

    Can you provide an example of applying the ‘Rule Of Three’?

    Sure. Let’s say that your site averages 300,000 visits a day. If we use the Rule of Three to calculate the Sample Rate for this kind of traffic it would look something like this:

    (300000 * 100)/250000 = 120

    Where 100 is the default Sample Rate, 300000 is the target traffic, and 250000 is the target traffic for the default Sample Rate.

    Hope that makes more sense now.

    This can get rather complex though, so I can see why overall site traffic is used as the guide.

    Yep, that’s correct.

    Thread Starter fabregas4

    (@fabregas4)

    Cool, thanks. That makes sense – was just unsure because the default was for ‘125k to 250k’.

    Going back on my previous point, maybe it’s too hard to try to take into account number of posts and how many each gets on average, but wouldn’t it be better to be using total pageviews to inform the sample rate, rather than total visits?

    Plugin Author Hector Cabrera

    (@hcabrera)

    Semantics 😛 I interpret here visits as (page)views.

    Thread Starter fabregas4

    (@fabregas4)

    Ah 🙂

    I could definitely see others using visits for their calc rather than pageviews based on the wording.

    Pageviews would normally be anywhere from 2x to 6x visits, which would give them a very different sample rate.

    Not trying to be clever here but would suggest that you update the wording in the docs (or clarify your definition of a visit).

    Plugin Author Hector Cabrera

    (@hcabrera)

    Not trying to be clever here but would suggest that you update the wording in the docs (or clarify your definition of a visit).

    Not at all, suggestions are welcome so thanks. I’ll revisit that section of the documentation once I get some spare time 🙂

Viewing 7 replies - 1 through 7 (of 7 total)

The topic ‘How Sampling Rate Actually Works’ is closed to new replies.