We are building out reporting on our side to track the cost incurred by each of BI stack assets (including BigQuery table syncs, dbt models, Holistics Dashboards) so we can identify spikes and prioritize optimization. Since a lot of this cost is incurred on the BigQuery side, we are segmenting BigQuery jobs data to allocate cost to specific platforms and assets.
At present, Holistics Dashboard runs pass the source identifier and source type in the executed query. We can source details about each dashboard via the Holistics API. Executions driven by data checks and scheduled sends do not currently include that same metadata. I have been able to match individual jobs using the Jobs Monitoring > Jobs list on the Holistics platform to identify the asset underlying the call. Is there any way to export this data (or potentially to access it via the API)? Specifically of interest are the Source, Job Queue, and Action fields.
I get the idea but I’m not sure I understand the details.
Could you please elaborate on the API details? Would an API like this be enough for your needs?
Input
source_type (e.g. 'Dashboard')
source_id
Output
List of jobs of the input source. Each job containing
Job Queue
Action
Besides, while having some APIs is indeed a valid and robust solution, what do you think about this solution of having a built-in dashboard for Data Warehouse Cost Analysis: Data Warehouse Cost Analysis for Holistics Jobs?
Thank you for taking a look here.
Ideally, we would be able to source the below data points based on a Holistics Job ID. These are provided in the Generated SQL statements for most action types. The ability to source these for all action types would enable us to execute a consistent process to join jobs data from BigQuery to specific Holistics assets.
I really like the idea of the Data Warehouse Costs dashboard you proposed! It would provide an efficient and flexible view into costs for any authorized and interested user. The reason I believe it would be helpful to have consistent jobs data coverage is that it would enable us to merge Data Warehouse costs driven by Holistics assets and other assets (analytics models, dbt models, etc.) in the same view.
Thank you,
Bob
Instead of having the API return those low-level details and then having to match against BigQuery Jobs yourself, what do you think about having the API return the BigQuery Job info directly like this?
Thank you for looking into this further! It would be helpful to access calculated cost via the Holistics API. The one thing this approach would not enable us to do would be to tie that cost to a specific asset in Holistics. Ideally, we want to be able to identify and track cost centers by asset to identify where assets may need to be re-assessed for simplification or refactor.
Ah sorry I got caught up with the costs that I forgot about that.
Let us make an attempt on improving the API for your needs and we will notify you once there is any update.