AQL metric - Nested aggregation not working properly?

Hi, I’m testing an AQL metric that must calculate the average deployment delay for our docker images (in a software development context).

It’s a two-step process :

  • First I need to calculate the delay for each image. This is already a metric that needs to aggregate multiple rows of data.
  • second I can calculate the average delay across all images

I first tried to create an all-in-one metric :

  metric avg_deployment_delay {
    label: 'Deployment delay'
    type: 'number'
    description: ''
    definition: @aql
      models.fct_dlv_kube_image_deployment 
      | group(models.fct_dlv_kube_image_deployment.kid_image_tag_cd)
      | select(
          delay: (min(models.fct_dlv_kube_image_deployment, epoch(models.fct_dlv_kube_image_deployment.kid_date_debut_validite_ts)) - min(models.dim_dlv_image_production, epoch(models.dim_dlv_image_production.imp_committed_date_ts))) / 60 / 60
      )
      | average()
    ;;
  }

With this version the problem I have is that Holistics does not calculate the average but returns the max value apparently. I tried to replace average by sum or count but I always had the same value !

I tried then to split this metric in two:

  metric delay {
    label: 'delay'
    type: 'number'
    description: ''
    definition: @aql
      (
        min(models.fct_dlv_kube_image_deployment, epoch(models.fct_dlv_kube_image_deployment.kid_date_debut_validite_ts))
        - min(models.dim_dlv_image_production, epoch(models.dim_dlv_image_production.imp_committed_date_ts))
      )
      / 60 
      / 60
    ;;
  }

  metric avg_deployment_delay {
    label: 'Deployment delay'
    type: 'number'
    description: ''
    definition: @aql
      models.fct_dlv_kube_image_deployment 
      | group(models.fct_dlv_kube_image_deployment.kid_image_tag_cd)
      | average(delay)
    ;;
  }

And now, if I build a table with both metrics (delay and avg_deployment_delay), I get the right result. But if I only use the target metric in my table I get the wrong result (back to the max value instead of average).

Note: Beside these bugs, I’m very excited by the possibilities that the new AQL engine provides. It can completely change the way we work with Holistics.

Best regards,
Damien

1 Like

Hi @dacou,

Our apologies for the inconvenience.
I have just checked this issue with my engineering team and it seems like a bug :bug:

We will investigate it further and let you know when there is an update.

2 Likes

Hi @dacou,
We have deployed the fix on Production and the issue should be solved by now.
Could you help us to check it again on your side to see if the issue persists?

Hi, Yes it works well now, thanks for the fix.

There is still an issue left with the syntax below which does not work but it may be another problem :
I read that I could write average(delay) like below but I get a “Model not found” error when doing that. Instead I write average() and it works.

metric avg_deployment_delay {
    label: 'Deployment delay'
    type: 'number'
    description: ''
    definition: @aql
      models.fct_dlv_kube_image_deployment 
      | group(models.fct_dlv_kube_image_deployment.kid_image_tag_cd)
      | select(
          delay: (min(models.fct_dlv_kube_image_deployment, epoch(models.fct_dlv_kube_image_deployment.kid_date_debut_validite_ts)) - min(models.dim_dlv_image_production, epoch(models.dim_dlv_image_production.imp_committed_date_ts))) / 60 / 60
      )
      | average(delay)
    ;;
  }