Holistics Datasets now are much more powerful without root model

Khai_To · October 11, 2019, 7:29am

Hello Holistics power users!

As you may know, we are rolling out a new version of Holistics that takes your analytical game to the next levels. The core components of the new Holistics experience, namely Data Models and Datasets, are constantly evolving, and today we would like to introduce to you a new and more powerful version of our upcoming Dataset!

Drawbacks of the current version

Dataset with root model is not flexible enough

As shared previously, Dataset is a SQL generation interface that produces a series of JOINs according to the fields that you drag in. The current Dataset version requires a root model to act as the first table in the JOIN chain.

While this rigid structure allows Analysts to ensure data to be explored exactly in the way they want, it makes it hard to reuse datasets.

For example, we have models Orders, Users and Countries that have relationships with one another. To explore orders data using supplement information from the other two models, we need to create a dataset called Orders - Full Info with orders as root, while users and countries are peripherals.

This dataset can answer questions like “How many orders are from users at different countries?”. Behind the scene, the generated queries will always have the following JOINs:

SELECT c.country_name, count(distinct o.user_id) as users_count
FROM orders o
LEFT JOIN users u on o.user_id = u.id
LEFT JOIN countries c on u.country_code = c.code
GROUP BY 1

In other words, it produces the list of users who already purchased. If the question is “How many registered users (both purchased and not purchased)?” this dataset will not produce the correct answer. ****

To answer this new question you will need to create a Users Info dataset with users as root:

So the JOIN chain will be:

SELECT c.country_name, count(u.id) as users_count
FROM users u
LEFT JOIN countries c on u.country_code = c.code
GROUP BY 1

If there are multiple models that can be linked and explored in different ways, you may need to create multiple slightly different datasets, and it is not an optimal way to reuse data.

Ambiguous path causes confusion

At the current version of Dataset, we allow ambiguous paths between models (Please check our documentation for a detailed explanation). In a Dataset with the models and relationships below, there are multiple ways to go from order_items to cities:

While the concept of JOIN paths is familiar for experienced SQL users, it is not friendly enough for the less technically inclined.

These two drawbacks pushed us to spend more efforts in researching and implementing a new version of Dataset that is easy to use yet powerful at the same time.

Here comes the new Dataset:

No need for root model!

Just as the name suggested, you no longer need to specify a root model when creating a dataset. In other words, you only need one dataset to rule them all!

The same dataset with Orders, Users and Cities can now answer both questions above:

How many orders are from users in different countries?".

1315×611 39.6 KB
How many registered users (both purchased and not purchased)?

1347×611 38.3 KB

Now you can reuse your Datasets efficiently.

No ambiguous path

When we detect possible ambiguous paths between models, we will automatically disable a relationship that introduced the ambiguity. For example, if your models have the following setup:

We will disable the relationship between Merchants and Cities models:

It is recommended that you duplicate Cities and Countries models into dedicated models for Users and Merchants. This makes your dataset clearer to the end-user:

What will happen to my reports and Datasets and how to deal with errors if any?

All of your previous datasets will be automatically migrated to this new version. Your reports, however, may be affected by this change:

Reports created from Datasets with only one model will not be affected
Reports created from Datasets with multiple data models but only used fields in Root Model will not be affected
Reports created from the Dataset but used fields from peripheral models will be slightly affected by the migration.

Behind the scene, those fields will be turned into “Legacy Fields” which cannot be edited or used in aggregations. To use the fields, you need to remove them from Viz Settings and drag them in again.

Your reports will still function normally if you do nothing, but we recommend you update your reports with the new fields.

https://holistics-cdn.s3.ap-southeast-1.amazonaws.com/community/fix_legacy_fields.gif(image larger than 4 MB)

In case you need help while migrating your works, please do not hesitate to contact us at [email protected]. Happy modeling!

Khai_To · October 16, 2019, 8:51am

One small note is that even after removing legacy fields from Viz Settings and dragging them in again, you think that your report’s result is incorrect, it’s probably because we have automatically disabled some relationship directions that cause ambiguity in your Dataset.

In this situation, you could Click on Edit Dataset and adjust your Relationship Direction properly that matches your Business Logic.

In the example, after migrating my report, I perceived that it was indicating location-based metrics of my buyers instead of sellers. Thus, I disabled the direction from Users to Cities and enabled that from Merchants to Cities. And then voilà, my report was updated accordantly to the change

DataGeekDude · June 14, 2022, 3:49pm

Thanks for this change, and I think this adds some great flexibility. When will this be available to all customers?

I agree that this concept is the way to solve a circular relationship but managing multiple model definitions isn’t great at all. I believe this is where role-playing tables is what we really need, so we can alias models with a prefix and have the same experience as you state, but with much much less maintenance overhead.

Khai_To · August 18, 2022, 4:21am

Hi Craig, this is something that we do have a plan to support but cannot share with you the exact timeline.
I will link this thread to your original thread about Role-playing Models