If that’s the case, we will always test the validity of your modeling setup before deploying it to production. If your models or datasets are invalid, the deployment will be failed and you will need to resolve it before continuing deploying.
But in this scenario nothing would have changed on the Holistics layer - only in the underlying db…
Difficult to say in the long run as we’re just setting up so they’re changing fairly frequently. My experience though is that models do continuously evolve, but slowly.
I have just had the same request come from one of our analysts.
We use dbt, and have a separate GCP project where all the output of each individuals dbt runs are output too for development e.g. dbt-development, vs our production project production.
If an analyst makes a change to a dbt model locally, e.g. add a new column, they would like to be able to add this new column in AML and check what the data looks like in a simple chart. This way they can validate their change works as expected.
The outstanding feature to complete this would be to be able to place variable blocks within the model’s query code to automatically switch between different tables between dev and prod - something like
select
*
from
– if development_environment – {{user_variable_dev_schema}}.table1
– if production_environment – production_schema.table1
We already use this to test out changes in the AML layer, but as @Alex_H points out above, it doesn’t allow us to switch the data source underneath.
To provide a more complete example, say we have a table called orders and I need to add a new column product category to this table.
First, I would add this table in dbt to our data warehouse. I do this by making a change locally, running the code and tests against this change. The data with the new column now lives in dbt-development.order_data.stg_orders and I can query it.
However, at the moment in Holistics, we query the production data which lives at production.order_data.stg_orders. If I want to test the impact my change in development has made to reporting, I need to deploy this to production first.
But what if the change is wrong? I would need to go back and create another PR and deploy the change over and over again until I get it right.
However, if Holistics was in development mode which was configured to point to dbt-development rather than production then we could query the new change without deploying it to production.
I would see this happening in the data source configuration within Holistics and the AML code would stay the exact same. This would mean we don’t need to change the code, as per @Alex_H example above.
We do this in dbt ourselves, where if our connection configuration is for development our output goes to dbt-development and if our connection configuration is for production, the output goes to production.
Happy to discuss further and even demo what we do in dbt if that would help.
One point to bear in mind in dbt development though is that each developer could have their own dev schema, so any global dev configuration would also need to apply user variables.
Hi @Alex_H, @DataGeekDude, @Abdel,
Thank you so much for all the use cases and suggestions. I would love to discuss the complete workflow with you guys. I will try to setup a call with you soon to further understand it.
This could be something that we’re planning to do is dynamic modeling. Dynamic modeling in combination with Preview Reporting would definitely help you guys.
When I change my datasource OR the AML, I want to be sure nothing breaks in Holistics Dashboards/Reports.
Example:
In my dev/staging datasource (assuming I can switch prod with dev/staging datasource), I remove/rename a few fields, how will I be able to make sure in development mode that nothing downstream breaks?
In AML, I remove a measure or dimension (for cleaning up my code for example), how can I be sure that that specific field isn’t used in any report?
For data testing, I didn’t pull up the topic, but it is worth mentioning, that a test feature would also be helpful to test metrics on a more automated way. Maybe unit testing? But that is a bot off topic
This totally makes sense @Abdel. The Reporting Validation is expected to be done before deployment.
The only issue of fixing the report before deployment is the possible downtime (fixing the error which is not happened yet). What we should have in the long-run is the ability to create Dashboard as-code you so can manage the whole modeling and reporting workflow in development mode and when everything is valid, you can then confidently click Deploy to Production.
This should be part of our long-term plan (not this year I believe). I will check with my team to see if we can set higher priority for it in the last quarter this year, but cannot promise for now.
Question for @Abdel, @Alex_H et @DataGeekDude who contributed to this feature request : how do you solve this issue right now ? I don’t see a viable option and I don’t want to manually change all the schema and datasource names when deploying to production
Now, to make the data_source_name overriding working correctly, we will have to avoid using schema prefix because it can’t be overrided at the dataset layer as the data_source_name.
I would like to open this subject again : like @Abdel it would really help us to have a staging env distinct from production.
We use an external git to version our models but we find that :
the dataset preview feature is not enough to validate a dataset (you need to start building dashboards to do that
it’s very difficult to deal with test vs production database/schemas within one single holistics env
If we could have a “Prod” holistics env connected to the master branch of our git repo and our prod datawarehouse as well as a “Test” Holistics env connected to a staging branch of the same repo and a staging datawarehouse it would very much simplify all these issues.
And I think Holistics already has all the features to support this : you only need external git support