Data Transforms - flow-based scheduling, tests

I would like to re-platform our BigQuery to BigQuery data transforms, which are currently being run using BQ operators from Airflow. I was initially considering dbt but would prefer to keep everything in Holistics – I really like how persisted transforms are implemented.

Some things that are holding us back:

  1. Mainly, the lack of flow-based scheduling – I see it mentioned as a future possibility in the docs, but not in the roadmap. Any idea, roughly, when this might be implemented?

  2. Other big feature of dbt that I feel would be super helpful is data quality testing with alerts. Also alerts for completion / failure of scheduled transform or storage.

Hello Charlie!
I’m Dave, from Holistics’ Growth Team. Thank you for your questions.

On the flow-based scheduling features, this is something that we still have in our roadmap but relies on a lot of infrastructure that we are currently building so we don’t have a definite date yet, though we are aiming to push it out before 2021 is over.
On the same note, we are planning to release our API and Command Line Interface soon. You might be able to use it with Airflow to invoke the jobs, and thus achieve similar results to flow-based transformations.

On data quality with email alerts, it is certainly in our direction, yet similar to above we don’t have a clear timeline for it yet. At the moment for Q1, our main focus will be on releasing BI-as-code features and one of the aims is to enable version control for your data/BI pipelines within Holistics and should help with your data quality woes.

A lot of these features you’re asking are already available in dbt or similar tools. We too see the usefulness of these features and have been carefully considering our own versions for quite a while. One of the paths that we are considering would be to support integration with dbt instead, so you won’t have to do a full migration over to Holistics, but instead, add Holistics to your analytics and ELT ecosystem and have everything work together. We are still drafting out our roadmap with regards to these features and we will do a proper announcement when our direction is confirmed.

We’re looking forward to bringing you a lot of cool stuff this year so do stay tuned and please keep your suggestions and use cases coming!

Dave, thanks for your very thorough answer.

I admire how you are approaching the development of your platform, and especially how keen you seem to be to learn from the successes (and failures!) of other tools – and that comes across very clearly in your documentation, especially the Analytics Setup Guidebook.

I’m looking forward to the development of these features. For now it seems I have two obvious options for the short term:

  1. continue with cron-based scheduling of the Data Transforms in Holistics, with a view to leveraging the forthcoming API to trigger from Airflow.

  2. use dbt for now and either migrate to or integrate with Holistics later.

Regarding dbt integration, I think that this would bring a lot of attention to your product. I think it was a really smart move for FiveTran, but to have DBT in the reporting tool would actually be even more amazing. Personally I think it would be a great move but of course, that’s up to you.

At the risk of being annoying how soon do you think you may release the API / CLI endpoints for external triggering of the persisted data transforms? This would probably guide my decision one way or the other – you can probably tell I’m keen to leverage Holistics if possible.