Development workflow//testing

If that’s the case, we will always test the validity of your modeling setup before deploying it to production. If your models or datasets are invalid, the deployment will be failed and you will need to resolve it before continuing deploying.

But in this scenario nothing would have changed on the Holistics layer - only in the underlying db…

Ok, I see. How frequently does your underlying table structure change?

Difficult to say in the long run as we’re just setting up so they’re changing fairly frequently. My experience though is that models do continuously evolve, but slowly.

I have just had the same request come from one of our analysts.

We use dbt, and have a separate GCP project where all the output of each individuals dbt runs are output too for development e.g. dbt-development, vs our production project production.

If an analyst makes a change to a dbt model locally, e.g. add a new column, they would like to be able to add this new column in AML and check what the data looks like in a simple chart. This way they can validate their change works as expected.

1 Like

It would be also very helpful for us if we can have a staging environment in Holistics.

Hi @DataGeekDude and @Abdel,

Not sure if something like Preview Reporting helps?

[Upcoming] Preview Reporting before Deployment - News from Holistics / Product Releases - Holistics Community

1 Like

This definitely moves us forward, thanks Khai.

The outstanding feature to complete this would be to be able to place variable blocks within the model’s query code to automatically switch between different tables between dev and prod - something like

select
*
from
– if development_environment – {{user_variable_dev_schema}}.table1
– if production_environment – production_schema.table1

1 Like

Hi Khai,

We already use this to test out changes in the AML layer, but as @Alex_H points out above, it doesn’t allow us to switch the data source underneath.

To provide a more complete example, say we have a table called orders and I need to add a new column product category to this table.

First, I would add this table in dbt to our data warehouse. I do this by making a change locally, running the code and tests against this change. The data with the new column now lives in dbt-development.order_data.stg_orders and I can query it.

However, at the moment in Holistics, we query the production data which lives at production.order_data.stg_orders. If I want to test the impact my change in development has made to reporting, I need to deploy this to production first.

But what if the change is wrong? I would need to go back and create another PR and deploy the change over and over again until I get it right.

However, if Holistics was in development mode which was configured to point to dbt-development rather than production then we could query the new change without deploying it to production.

I would see this happening in the data source configuration within Holistics and the AML code would stay the exact same. This would mean we don’t need to change the code, as per @Alex_H example above.

We do this in dbt ourselves, where if our connection configuration is for development our output goes to dbt-development and if our connection configuration is for production, the output goes to production.

Happy to discuss further and even demo what we do in dbt if that would help.

1 Like

We’re completely on the same page @DataGeekDude.

One point to bear in mind in dbt development though is that each developer could have their own dev schema, so any global dev configuration would also need to apply user variables.

Hi @Khai_To,

In an ideal situation, you would just do your testing in a sandbox environment, that carries all your production reports.

That way the developer can have the freedom to apply changes and experiment.

The report preview doesn’t sound really as some major benefit to us.

The main reasons:

  • the preview mode is very manual, so testing needs to be manually. If you have 20 reports depending, you cannot go through them all manually.
  • switch of datasource to dev/staging is not supported there

Anyway, I understand the feature, and it will add value, but it is definitely not close to having a seperate dev/staging environment.

Hi @Alex_H, @DataGeekDude, @Abdel,
Thank you so much for all the use cases and suggestions. I would love to discuss the complete workflow with you guys. I will try to setup a call with you soon to further understand it.

This could be something that we’re planning to do is dynamic modeling. Dynamic modeling in combination with Preview Reporting would definitely help you guys.


For the comment from @Abdel

Could you specify what do you mean by testing? Do you want to test:

  • if there are broken reports
  • or test how will the data of the reports be altered due to modeling changes?
1 Like

Hi Khai,

When I change my datasource OR the AML, I want to be sure nothing breaks in Holistics Dashboards/Reports.

Example:

  • In my dev/staging datasource (assuming I can switch prod with dev/staging datasource), I remove/rename a few fields, how will I be able to make sure in development mode that nothing downstream breaks?
  • In AML, I remove a measure or dimension (for cleaning up my code for example), how can I be sure that that specific field isn’t used in any report?

For data testing, I didn’t pull up the topic, but it is worth mentioning, that a test feature would also be helpful to test metrics on a more automated way. Maybe unit testing? But that is a bot off topic

Hi @Khai_To ,

This feature might also help here

However, I wouldn’t want to fix stuff after deployment, but want to do report validation BEFORE deployment

2 Likes

This totally makes sense @Abdel. The Reporting Validation is expected to be done before deployment.
The only issue of fixing the report before deployment is the possible downtime (fixing the error which is not happened yet). What we should have in the long-run is the ability to create Dashboard as-code you so can manage the whole modeling and reporting workflow in development mode and when everything is valid, you can then confidently click Deploy to Production.

1 Like

Hi @Khai_To

That’s exactly what I was looking forward to.
But on what term would Dashboard/Report as code be there?

This should be part of our long-term plan (not this year I believe). I will check with my team to see if we can set higher priority for it in the last quarter this year, but cannot promise for now.

Question for @Abdel, @Alex_H et @DataGeekDude who contributed to this feature request : how do you solve this issue right now ? I don’t see a viable option and I don’t want to manually change all the schema and datasource names when deploying to production :confused:

At the moment I do not have a solution to this

Hello,

We use holistics 4.0 and started migrating to AML 2.0.
We were facing the same kind of issue.

One way to do it (hacky solution :pensive:) is to create a model for example env.aml with the following code:

Model env {
  type: 'table'
  label: ''
  description: ''
  data_source_name: 'MY_DATASOURCE_TO_USE'
  table_name: 'MY_SCHEMA_PREFIX'
}

Then, all models can refer to the env model for defining the default datasource to use. Here is an example:

use env 

Model my_model {
  type: 'table'
  label: 'My model'
  description: ''
  data_source_name: env.data_source_name
  dimension field_1 {
    label: 'Field 1'
    type: 'number'
    hidden: false
    definition: @sql {{ #SOURCE.MY_FIELD_ID }};;
  }
  
  owner: '[email protected]'
  table_name: '"' + env.table_name + '_SCHEMA"."MY_TABLE"'
} 

Datasets can force models to use a specific data_source_name :

use env {
  env_dev  
}

use models {
  my_model
}

Dataset my_dataset {
  label: 'My dataset
  description: ''
  data_source_name: env_dev.data_source_name
  models: [
    my_model
  ]
  relationships: [
    
  ]
  owner: '[email protected]'
}

Now, to make the data_source_name overriding working correctly, we will have to avoid using schema prefix because it can’t be overrided at the dataset layer as the data_source_name.

2 Likes

I would like to open this subject again : like @Abdel it would really help us to have a staging env distinct from production.

We use an external git to version our models but we find that :

  • the dataset preview feature is not enough to validate a dataset (you need to start building dashboards to do that
  • it’s very difficult to deal with test vs production database/schemas within one single holistics env

If we could have a “Prod” holistics env connected to the master branch of our git repo and our prod datawarehouse as well as a “Test” Holistics env connected to a staging branch of the same repo and a staging datawarehouse it would very much simplify all these issues.

And I think Holistics already has all the features to support this : you only need external git support :thinking: