Development workflow//testing

Hi @DataGeekDude and @Abdel,

Not sure if something like Preview Reporting helps?

[Upcoming] Preview Reporting before Deployment - News from Holistics / Product Releases - Holistics Community

1 Like

This definitely moves us forward, thanks Khai.

The outstanding feature to complete this would be to be able to place variable blocks within the model’s query code to automatically switch between different tables between dev and prod - something like

select
*
from
– if development_environment – {{user_variable_dev_schema}}.table1
– if production_environment – production_schema.table1

1 Like

Hi Khai,

We already use this to test out changes in the AML layer, but as @Alex_H points out above, it doesn’t allow us to switch the data source underneath.

To provide a more complete example, say we have a table called orders and I need to add a new column product category to this table.

First, I would add this table in dbt to our data warehouse. I do this by making a change locally, running the code and tests against this change. The data with the new column now lives in dbt-development.order_data.stg_orders and I can query it.

However, at the moment in Holistics, we query the production data which lives at production.order_data.stg_orders. If I want to test the impact my change in development has made to reporting, I need to deploy this to production first.

But what if the change is wrong? I would need to go back and create another PR and deploy the change over and over again until I get it right.

However, if Holistics was in development mode which was configured to point to dbt-development rather than production then we could query the new change without deploying it to production.

I would see this happening in the data source configuration within Holistics and the AML code would stay the exact same. This would mean we don’t need to change the code, as per @Alex_H example above.

We do this in dbt ourselves, where if our connection configuration is for development our output goes to dbt-development and if our connection configuration is for production, the output goes to production.

Happy to discuss further and even demo what we do in dbt if that would help.

1 Like

We’re completely on the same page @DataGeekDude.

One point to bear in mind in dbt development though is that each developer could have their own dev schema, so any global dev configuration would also need to apply user variables.

Hi @Khai_To,

In an ideal situation, you would just do your testing in a sandbox environment, that carries all your production reports.

That way the developer can have the freedom to apply changes and experiment.

The report preview doesn’t sound really as some major benefit to us.

The main reasons:

  • the preview mode is very manual, so testing needs to be manually. If you have 20 reports depending, you cannot go through them all manually.
  • switch of datasource to dev/staging is not supported there

Anyway, I understand the feature, and it will add value, but it is definitely not close to having a seperate dev/staging environment.

Hi @Alex_H, @DataGeekDude, @Abdel,
Thank you so much for all the use cases and suggestions. I would love to discuss the complete workflow with you guys. I will try to setup a call with you soon to further understand it.

This could be something that we’re planning to do is dynamic modeling. Dynamic modeling in combination with Preview Reporting would definitely help you guys.


For the comment from @Abdel

Could you specify what do you mean by testing? Do you want to test:

  • if there are broken reports
  • or test how will the data of the reports be altered due to modeling changes?
1 Like

Hi Khai,

When I change my datasource OR the AML, I want to be sure nothing breaks in Holistics Dashboards/Reports.

Example:

  • In my dev/staging datasource (assuming I can switch prod with dev/staging datasource), I remove/rename a few fields, how will I be able to make sure in development mode that nothing downstream breaks?
  • In AML, I remove a measure or dimension (for cleaning up my code for example), how can I be sure that that specific field isn’t used in any report?

For data testing, I didn’t pull up the topic, but it is worth mentioning, that a test feature would also be helpful to test metrics on a more automated way. Maybe unit testing? But that is a bot off topic

Hi @Khai_To ,

This feature might also help here

However, I wouldn’t want to fix stuff after deployment, but want to do report validation BEFORE deployment

2 Likes

This totally makes sense @Abdel. The Reporting Validation is expected to be done before deployment.
The only issue of fixing the report before deployment is the possible downtime (fixing the error which is not happened yet). What we should have in the long-run is the ability to create Dashboard as-code you so can manage the whole modeling and reporting workflow in development mode and when everything is valid, you can then confidently click Deploy to Production.

1 Like

Hi @Khai_To

That’s exactly what I was looking forward to.
But on what term would Dashboard/Report as code be there?

This should be part of our long-term plan (not this year I believe). I will check with my team to see if we can set higher priority for it in the last quarter this year, but cannot promise for now.

Question for @Abdel, @Alex_H et @DataGeekDude who contributed to this feature request : how do you solve this issue right now ? I don’t see a viable option and I don’t want to manually change all the schema and datasource names when deploying to production :confused:

At the moment I do not have a solution to this

Hello,

We use holistics 4.0 and started migrating to AML 2.0.
We were facing the same kind of issue.

One way to do it (hacky solution :pensive:) is to create a model for example env.aml with the following code:

Model env {
  type: 'table'
  label: ''
  description: ''
  data_source_name: 'MY_DATASOURCE_TO_USE'
  table_name: 'MY_SCHEMA_PREFIX'
}

Then, all models can refer to the env model for defining the default datasource to use. Here is an example:

use env 

Model my_model {
  type: 'table'
  label: 'My model'
  description: ''
  data_source_name: env.data_source_name
  dimension field_1 {
    label: 'Field 1'
    type: 'number'
    hidden: false
    definition: @sql {{ #SOURCE.MY_FIELD_ID }};;
  }
  
  owner: '[email protected]'
  table_name: '"' + env.table_name + '_SCHEMA"."MY_TABLE"'
} 

Datasets can force models to use a specific data_source_name :

use env {
  env_dev  
}

use models {
  my_model
}

Dataset my_dataset {
  label: 'My dataset
  description: ''
  data_source_name: env_dev.data_source_name
  models: [
    my_model
  ]
  relationships: [
    
  ]
  owner: '[email protected]'
}

Now, to make the data_source_name overriding working correctly, we will have to avoid using schema prefix because it can’t be overrided at the dataset layer as the data_source_name.

2 Likes

I would like to open this subject again : like @Abdel it would really help us to have a staging env distinct from production.

We use an external git to version our models but we find that :

  • the dataset preview feature is not enough to validate a dataset (you need to start building dashboards to do that
  • it’s very difficult to deal with test vs production database/schemas within one single holistics env

If we could have a “Prod” holistics env connected to the master branch of our git repo and our prod datawarehouse as well as a “Test” Holistics env connected to a staging branch of the same repo and a staging datawarehouse it would very much simplify all these issues.

And I think Holistics already has all the features to support this : you only need external git support :thinking:

Hi @dacou , thanks for sharing this. Could you specify what kind of outcome you’re seeking from Dataset Validation? Does the Preview Option work in this case?

Hi @Khai_To , what I mean is that in order to fully validate a dataset, we need to start building reports, dashboards and filters to confirm that it is a good fit for the intended purpose.

For now, the models and dataset preview feature are useful for the first controls :

  1. “model preview” : detect formatting issues, visual check of measures
  2. “dataset preview” : control relationship + sandbox for dataset first checks

When these first checks are done, we publish the dataset in order to start building dashboard, reports and filters. At this stage, the dataset is already merged in the master branch but some major changes can still occur on to implement features that could not be detected before :

  • custom dimensions (groupings, encodings, value ranges, …)
  • custom models (depending on the visuals that we are trying to build
  • data formatting
  • pre-aggregation (depending on query performance feedback)

For a new dataset, these first steps happen with models and datasets querying a staging database.
When it’s done, we publish the new tables to our production database and switch the datasource of our models and datasets to production.

With this workflow it is not easy to have at the same time multiple versions of the same models and datasets (one production version and one+ version for dev/test).

If we had a “prod” vs “test” holistics env, we would solve these issues :

  • maintain both test/prod versions of models and dataset
  • expose new versions of datasets to some key users for validation/iteration on the “test” env
  • and only when all of this is done, merge to master and publish to “prod” Holistics

The main issue with this is that dashboards built on the “test” env would have to be rebuilt from scratch when moving to the “prod” env.

One ideal solution would be to have one single holistics env but have “branch support” included in the URL for both reporting and modeling :star_struck: so that we can share with our key users a dataset or dashboard for review before merging to master :
https://eu.holistics.io/{release-xxx}/dashboards/v3/

1 Like

Hi @dacou,

Thanks for your detailed explanation. :grin:

I totally understand that our current version doesn’t support this use case well.

The only (upcoming) option that you can use is our Preview Reporting, but we haven’t released it yet due to several performance issues. Our team is actively working on the improvement so that we can release it as soon as possible.


I think that this use case is totally valid. We’re discussing internally how to best solve this, and one of the options we’re thinking of is environment variable where we allow users to customize the behavior of your project depending on where the project is running (either Production, Development, or a specific Branch).

For example, you can define the dataset

Func ds_name () {
  if (env.IS_PRODUCTION == true) {
    env.PROD_DB_NAME
  } else {
    env.DEV_DB_NAME
  }
}

Dataset name { 
  label: 'Name'
  description: 'Something here'
  data_source_name: ds_name()
  // Other configurations
}

and the .env file

IS_PRODUCTION=true
PROD_DB_NAME=prod_db_name
DEV_DB_NAME=dev_db_name

Do note that it’s just the drafted solution :arrow_up:. The final one could be different.


With this use case, basically, you want to share a version of your Data (dashboard) to another Data Analyst so that he/she can check review it before Deploying to Production.
With the release of our upcoming Preview Reporting, you can basically Commit and Push your changes to a particular branch and ask the reviewers to checkout that branch and go to the Dashboard to validate the data.
But of course, with the branch name included in the URL, it would be much easier for you to quickly share the Dashboard with the reviewers.
I will note this down and consider supporting it in the future.

Do let me know if you have any questions.

Thanks,

Hi,

The environment variable feature is a great addition. We use a hacky workaround for now to simulate this (detailed by @Julien_OLLIVIER in this post ).

Regarding the “Preview reporting” feature :

  • Will we be able to save reports and dashboards across “preview reporting” sessions ? This would be a must have to collaborate between data analysts and end users when dataset reviews last multiple days.
  • Will it be available to users other than data analysts ? Most of our “dataset reviewers” are product owners and business users who don’t have access to the modeling view right now.

Damien

1 Like

Our upcoming release for Preview Reporting won’t allow you to save any changes to the Reports/Dashboards (in the Preview Session).

Also, the Preview Reporting is user-based meaning that other users won’t observe the same Reporting Behavior as you do.
It will refer to your current Working Directory (instead of the commit) to reflect changes to the Reports/Dashboards. Thus, if you make changes in the Reporting while in Preview session (for e.g., replace or remove a field in a report), other users (viewing reporting items) will be affected.

However, could you share why do you need to save reports and dashboards while in Preview Session?


The option Preview Reporting is only available for users who have access to the Modeling layer. It’s currently not designed for non-DA users to do Acceptance Testing.

However, if you use case is to allow non-DA users to check and validate the Dashboard before Production Deployment, we would need to consider another approach.