How can I modeling "Number of Subscribers" using Holistics' data modeling layer?

In our product, one of business models is monthly subscription.

We define “number of subscribers” metric for “timestamp”, not “interval”.
(I mean “interval” = [timestamp1 ~ timestamp2])

e.g. “number of subscribers” for “2022-03-09” =
number of users who was in “subscribe” status at the end of “2022-03-09”
(= “2022-03-09 23:59:59:999999”)

So we prepared table in BigQuery which is unique for (user_id, date).
And if value of filter set as “daily”, then,

select
  date,
  count(user_id) as cnt_subscribers,
from
  subscriber_daily
group by
  date
order by
  date

if value of filter set as “weekly”, then,

select
  date,
  count(user_id) as cnt_subscribers,
from
  subscriber_daily
group by
  date
having
  format_date('%u', date) = '7' -- pick sundays only.
order by
  date

So far, we’ve been using holistics 2.0,
and try to introduce holistics 3.0,

Meanwhile, I’m stuck to how can I express above logic in Holistics’ data modeling layer.

Hi Hyun Lee,

I assume your data looks like this:

| user_id | date       | status     |
| ------- | ---------- | ---------- |
| 1       | 2022-01-01 | subscribed |
| 1       | 2022-01-02 | subscribed |
| 1       | 2022-01-03 | subscribed |
| 2       | 2022-01-01 | subscribed |
| 2       | 2022-01-02 | subscribed |

You want to report the number of subscribers on different time grains (daily, weekly, monthly), and it should reflect the number on the last day of the period. If the aggregation period is daily then it will return the daily numbers, if the period is weekly then it should return the Sunday numbers…

To achieve that, first you can package this aggregation into a SQL model:

select
  date,
  count(user_id) as cnt_subscribers,
from
  subscriber_daily
group by
  date

The result will look like this:

| report_date| cnt_subscribers |
| ---------- | --------------- |
| 2022-01-01 | 100             |
| 2022-01-02 | 110             |
| 2022-01-03 | 200             |

Now you can create a custom measure that returns the cnt_subscribers value at the end of the period.

max(cast(format_date('%Y%m%d', report_date) as integer) * 1000000 + {{ #THIS.cnt_subscribers }}) - max(cast(format_date('%Y%m%d', report_date) as integer) * 1000000) 

The logic of the measure is this:

  • Turn the date into an integer, multiply it by a really large number (for example, 1000000) and add the number of subscribers. Using max() on this figure will return a very large integer that represents the latest date & the number of subscribers on that date (for example 20220103000100) (1)
  • Subtract (1) with the max() of the (date integer * 1000000): 20220103000000
  • The result will be the number of subscribers at the end of the aggregation period, be it month or week.

For a more detailed walk-through of this approach, you can check our guide here: How to Calculate Dynamic Point-In-Time Metrics in Holistics | Holistics Docs

Hope this is what you need.

1 Like

Hi @hapham,

Thank you for your answer.

After posting this question, I read the blog post you mentioned. (~~~ Point-In-Time ~~~)
It’s little bit tricky, but surprising. :+1:

But when I tried to use that method,
I can’t find way to add “dimensions”.
e.g. How can I make it possible to dashboard viewer can give condition like “only Singapore subscibrers”

Could you help me?

Hi Huyn Lee,

This current setup is just a workaround so it’s not very flexible yet. To do that you need to add a dimension to the SQL transformation:

select
  date,
  country,
  count(user_id) as cnt_subscribers,
from
  subscriber_daily
group by
  date, country

Then in the report, you can add a country filter so that your users will only select the data of a single country. This will not work if your user selects two or more countries, or want to filter by any other dimension.

Internally, we are looking into how to better support these kinds of analyses better.

Hi @hapham ,

Thank you for your answer.

This will not work if your user selects two or more countries,

Yes, I met same problem you mentioned,
So I think I can’t use method you explained.

Internally, we are looking into how to better support these kinds of analyses better.

That’s good news.
I hope it will be resolved soon.