Get max value of sub-list in AQL

stonematt · September 18, 2024, 9:52pm

we have teams working in an event. I’m calculating the forecasted end date of work for each team in this event. – that works great. Now, I want to get the max(end_date) of the list for the event, but when I remove the team dimension, I’m getting the average of the underlying data, so I tried to make an AQL metric in my dataset like this (below).

in sql, i would do a CTE of the event/team data and pick max…

Any ideas?

metric max_end {
	label: "Max End Date"
	type: "datetime"
	definition: @aql
	group(events.event_name,stack_assignment_timing.team_name)
	| select(events.event_name, stack_assignment_timing.team_name, tot_end_date)
	| max(tot_end_date)
	;;
}

Result:

AQL Compilation Errors:

ae.groups is not iterable
Full Explore Query:

explore {
  dimensions {
	  }
	  measures {
		max_end: max_end
	 }
  }

anya.conti · September 19, 2024, 1:46am

To clarify, are you looking for something like the following?

Team (Dim), Expected End (Measure), Max End
1, 10/25/24, 01/02/25
2, 09/29/24, 01/02/25
3, 01/02/25, 01/02/25

If yes, my thoughts would be to try a window function. There’s some documentation here. We’ve struggled a little bit in certain circumstances with Holistics window functions requiring a measure as an input instead of a dimension, but I think that actually works perfectly in your use case if I’m following correctly. Here’s an example (where end_date is the measure that you already have that’s forecasting the team’s end date):

measure max_end_date {
    label: 'Max End Date'
    description: 'The max end date for all teams'
    type: 'datetime'
    definition: @aql window_max(end_date) ;;
}

Note: I excluded event in the above for simplicity, but adding it should be simple. You can change the definition to: definition: @aql window_max(end_date, partition: event) where event is the event dimension. So your table would then have event, team, expected end, max end as 4 fields instead of the 3 I originally put.

If you’re looking for the metric with the team dimension removed entirely, then I think you would need to use some LoD (level of Detail) functions, specifically dimensionalize. Here’s an example of that (where end_date is the measure that you already have that’s forecasting the team’s end date):

  dimension team_end_date {
    label: 'end_date_at_team_granularity'
    description: 'End Date per Team Granularity'
    type: 'number'
    definition: @aql end_date | dimensionalize(stack_assignment_timing.team_name) ;;
  }

  measure max_team_end_date {
    label: 'Max'
    type: 'number'
    description: 'Max End Date of All Teams'
    definition: @aql max(team_end_date)
    ;;
  }

Hope this helps!

tan · September 19, 2024, 4:04am

Hi Matt,

Is this correctly describe your use case?

From your description, it sounds like you have an event with multiple teams, each team having its own assignments and respective expected end dates. Initially, you’re seeing the correct maximum expected end date for each team within each event, like this:

Event Name	Team Name	Max Expected End Date
Event 1	Team 1	2024-09-01
Event 1	Team 2	2024-09-02
Event 1	Team 3	2024-09-03
Event 2	Team 1	2024-09-01
Event 2	Team 2	2024-09-02
Event 2	Team 3	2024-09-03

However, when you remove the team dimension to view the overall maximum expected end date for each event, instead of the expected result:

Event Name	Max Expected End Date
Event 1	2024-09-03
Event 2	2024-09-03

You’re seeing an average of the underlying data.

Could you clarify what you mean by “average”? Since we’re dealing with dates, it’s not clear how averaging them is being calculated in your exploration.

In general, since max() is an additive metric, the expected result when you remove the team dimension should still show the maximum end date across all teams for each event. If that’s not happening, there might be a configuration issue.

To help troubleshoot, could you provide additional screenshots or sample data, along with the configuration of your current exploration? This will help me better understand the structure and provide more targeted guidance.

As for the AQL formula you’re working with, the issue may lie in how you’re using group without the table parameter. Assuming tot_end_date is a metric, you could try the following:

stack_assignment_timing
| group(events.event_name, stack_assignment_timing.team_name)
| max(tot_end_date)

If tot_end_date is a dimension, you’ll need to wrap it with max() like this:

stack_assignment_timing
| group(events.event_name, stack_assignment_timing.team_name)
| max(max(stack_assignment_timing.tot_end_date))

which is equivalent to

max(
  stack_assignment_timing
  | group(events.event_name, stack_assignment_timing.team_name)
  | select(max(stack_assignment_timing.tot_end_date))
)

Let me know if this helps or if you’d like to dive deeper into any specific areas.

stonematt · September 19, 2024, 4:52pm

Ultimately the end date follows this pattern: add total days remaining (number) to the last seen completed stack (datetime):

days remaining requires a calculation of stacks per hour
when team is included, stacks per hour per team calculates the expected end date per team
when team is not included, the stacks per hour (for the whole event) is used, which isn’t useful for forecasting b/c some teams are appropriately fast and slow.
I need to know when the last team will be done (max_end_date_by_team, or max_days_left_by_team)

Background and examples.

these are cross-model metrics in the dataset
stack_assignement_timing is the underlying raw data table

    metric tot_end_date {
    label: "Evt End Date"
    type: "datetime"
    hidden: false
    description: "Projected end date based on observed throughput model"
    definition: @aql max(stack_assignment_timing.completed) + interval(tot_d_remaining days);;
  }

the tot_d_remain is the day version of hours (tot_hrs_remaining/24) calculated like this:

metric tot_hrs_remaining {
    label: "Event Hrs Left"
    type: "number"
    format: "#,###.00"
    hidden: false
    description: "Hours remaining to complete 135% of r1 stacks based on average stacks/hr in the last 24h"
    definition: @aql (total_stacks_remaining) / stack_assignment_timing.s_p_h_l24h;;
  }

Example: With team included

When I have team on the grid, it looks right (sort by Evt End Date):

Executed AQL

explore {
  dimensions {
    e_en_615074: events.event_name,
    sat_tn_8bb5e9: stack_assignment_timing.team_name
  }
  measures {
    tot_d_remaining: tot_d_remaining,
    tot_end_date: tot_end_date
  }
  filters {
    events.event_name is "2024-09"
  }
}

Example without team

removing team, “averages” the tot_d_remaining and uses that when adding to the event’s last completed stack date. (not an explicit average, just not grouped by team)

Executed AQL

explore {
  dimensions {
    e_en_615074: events.event_name
  }
  measures {
    tot_d_remaining: tot_d_remaining,
    tot_end_date: tot_end_date
  }
  filters {
    events.event_name is "2024-09"
  }
}

Business requirement: When will the last team finish?

I need to show the business when the last team will finish and which team it is.

stonematt · September 19, 2024, 5:13pm

@anya.conti I missed your reply and only noticed the one from Tan at Holistics. I think you’re on the right track with LoD. I had that in mind as I was working through it but couldn’t quick get the mental model right in my head. I will give your approach a try in the next couple of days. At this point the event is almost over so the forecast is a little less pressing. (It will be useful for the next event…).

Thanks for the response.
.stone

stonematt · September 19, 2024, 5:19pm

I was hoping for too much. I’ll swing back to it later…

Adhoc AQL

max(tot_hrs_remaining 
| dimensionalize(stack_assignment_timing.team_name))

Result:

tan · September 19, 2024, 5:27pm

Hi Matt,

Thanks for the detailed explanation. I understand the use case now.

In this case, I think this formula mentioned in my previous post would work.

stack_assignment_timing
| group(stack_assignment_timing.team_name)
| max(tot_end_date)

Can you try it out and see?

tan · September 19, 2024, 5:40pm

As for Anya’s suggestion, if you read it more closely, you’ll see that she suggested you to do it 2 steps:

Use dimensionalize to save tot_end_date as a dimension
Only then use that dimension in another metric to aggregate them further.

On the other hand, you’re trying to combine both step into a single metric expression, which is not supported because dimensionalize are only usable in the definition of a dimension.

Note that, Dimensions are static so they won’t react to visualization filter. Thus, the prefer approach is to use group (like what I suggested) to introduce nested aggregation in a metric, unless you want to use that metric as a dimension for binning or categorization.

stonematt · September 19, 2024, 7:00pm

Thanks @tan, it looks like that worked:

With team in the grid (order by event end date)

Without team in the grid.

Here’s my dataset aml

  metric max_end_date {
    label: "Max End Date"
    type: "datetime"
    hidden: false
    description: ""
    definition: @aql stack_assignment_timing
      | group(stack_assignment_timing.team_name)
      | max(tot_end_date)
    ;;
  }

That’s pretty cool.

Now how do I get “which team is has that end date?” like a

metric last_team_to_finish{
}

tan · September 20, 2024, 3:04am

Assuming you want that metric to display in a KPI chart, you can write it like this

metric last_team_to_finish {
    label: "Last Team to Finish"
    type: "text"
    hidden: false
    description: ""
    definition: @aql 
      stack_assignment_timing
      | group(stack_assignment_timing.team_name)
      // Calculate `tot_end_end` by `team_name`
      | select(
          stack_assignment_timing.team_name, 
          tot_end_date,
          // the `of_all` here is needed cuz we introduced a grouping on `team_name` above
          _max_of_all: max_end_date | of_all(stack_assignment_timing.team_name)
        )
      // filter to keep only the row where `tot_end_end` equals that `max_end_date`
      | filter(tot_end_date == _max_of_all)
      | max(stack_assignment_timing.team_name)
    ;;
  }

You can also use window function instead of of_all in this case, depend on the nature of your data, this can be more performant.

 metric last_team_to_finish {
    label: "Last Team to Finish"
    type: "text"
    hidden: false
    description: ""
    definition: @aql 
      stack_assignment_timing
      | group(stack_assignment_timing.team_name)
      // Calculate `tot_end_end` by `team_name`
      | select(stack_assignment_timing.team_name, tot_end_date)
      | filter(tot_end_date == window_max(tot_end_date, .., order: tot_end_date))
      | max(stack_assignment_timing.team_name)
    ;;
  }