Support Python in Holistics

Abdel · June 12, 2024, 1:16pm

Support Python can have a lot of benefits, whereof:

dynamic model generation
retrieving metric definition stored externally
building logic based on env variables

and many other benefits.

See examples of how Cube.dev supports Python

Dbt

HuyVu · June 13, 2024, 2:57am

Hi @Abdel,

Thank you for your suggestion!

Could you please share more details about your specific use cases? This will help us understand how best to solve them. Additionally, if we were to open more APIs from our side, do you think that would be enough to solve your requirements?

Best,

thanh · June 13, 2024, 3:17am

Hi @Abdel, let me explain more in details: We are working on more features to support diverse CI/CD needs, which you can look at here: Continuous Integration (CI/CD) | Holistics Docs (4.0). We will add API endpoints and ways for you to:

Inject custom logic into .env.aml
API to trigger validation and deployment

This means you can do everything in Python like the ones you mentioned, outside of Holistics, as part of a custom CI/CD pipeline (e.g. Github Actions, CircleCI, etc.).

What do you think about this approach?

Abdel · June 13, 2024, 8:13am

@HuyVu @thanh ,
Thanks for you reply.

Even though exposing API endpoints is great for the development workflow (dev/test/prod), the purpose of having Python is different.

With Python and Jinja, you can create dynamic models, by using Python functions that can retrieve data or logic that is maintained outside of Holistics.
That is the approach that Cube follows, which is great. Dbt has also followed this approach but slightly on a different way.

My use case for it:

We are looking to automate the generation of AML, which can be achieved by using dynamic models.

thanh · June 13, 2024, 8:23am

Hi @Abdel,

Thanks for your reply.

Since Holistics support external Git integration on Github/Gitlab, you can actually write a script to generate AML using Python/Jinja right now and automatically commit your generated code into the repo. This can be done using Github Actions or Gitlab CI/CD. With the API/webhooks approach, you can do anything you want with generated code while still having an integrated experience with Holistics. We will look into creating guide/documentation to show how this can be done.

The approach above is very flexible, however, it requires data engineering effort to build these yourself. Are you looking for a more integrated approach where this is done automatically on Holistics’ side?

Abdel · June 13, 2024, 9:29am

Hi @thanh
That sounds very good.
Are the required API endpoints already available to deploy changes automatically?

HuyVu · June 13, 2024, 11:54am

Hi @Abdel,

We’ll soon open the deployment API to support CI/CD use cases. We imagine a sample workflow like this:

A Pull Request is merged in Github e.g. from dev → main branch
The Github Actions trigger the Holistics deployment endpoint to deploy changes automatically

May I ask besides the deploy API, are there any APIs you’re looking for for your workflow?

Abdel · June 13, 2024, 12:12pm

Hi @huy
The workflow I am looking for is something like so:

A PR is merged to Development
Changes of that PR are validated against AML/Validation Endpoint
When all validations correct (and other CI checks), development can be merged to Master
Then a deploy is automatically triggered

So these 2 endpoints:

AML/Report validation endpoint
Deploy Endpoint

And
For our input/output test, we might also need to run queries against this API endpoint Get Reporting Data via API | Holistics Docs (4.0) using the development branche.
Currently that is only possible on 1 branche. Is something achievable like that?

That way we will be able to do input/output testing for metrics.

HuyVu · June 18, 2024, 3:18am

Hi @Abdel,

Thanks for your reply. As for the validation endpoint, there would be several validation steps:

Validate the Holistics Table Model against dbt model dbt to Holistics Validation | Holistics Docs (4.0)
Within Holistics, detect which dashboards, datasets are broken when you make a change in a Holistics data model Reporting Validation | Holistics Docs (4.0)
Data testing: test the number of metrics on the development branch

For (1), could you check the link is something that you’re looking for? Basically, the approach is to compare all the Holistics table models against dbt artifacts manifest.json and catalogs.json

For (2), we can open this API to let the CI system call that (similar to the Deploy endpoint)

For (3), it’s not possible for now, but yes in the long term

Best,

Abdel · June 19, 2024, 4:34pm

Other use case for Python support, is that you can automatically sync dbt with AML dimensions.

See how cube.dev did something similar in their platform: