Skip to content

Add aggregate_records DML tool and query-timeout to MCP server#3179

Draft
Copilot wants to merge 34 commits intomainfrom
copilot/add-aggregate-records-tool
Draft

Add aggregate_records DML tool and query-timeout to MCP server#3179
Copilot wants to merge 34 commits intomainfrom
copilot/add-aggregate-records-tool

Conversation

Copy link
Contributor

Copilot AI commented Feb 28, 2026

  • Explore current code state and understand unresolved comments
  • Address OrderBy without GroupBy concern (spec says orderby only applies with groupby)
  • Address first/after without GroupBy concern (spec says first/after only applies with groupby)
  • Address QueryTimeout upper bound validation
  • Address GetToolMetadata caching request
  • Address in-memory aggregation concerns (ensure SQL-level aggregation)
  • Ensure code builds and tests pass
  • Run code review and security checks
Original prompt

This section details on the original issue you should resolve

<issue_title>[Enh]: add aggregate_records DML tool to MCP server</issue_title>
<issue_description>## What?

Allow models to answer: "How many products are there?" and "What is our most expensive product?"

Why?

These are among the most common information discovery questions, a primary model use case.

How?

Introduce a new tool: aggregate_records that reuses native GraphQL aggregation capabilities in DAB.

Schema

{
  "type": "object",
  "properties": {
    "entity": {
      "type": "string",
      "description": "Entity name with READ permission.",
      "required": true
    },
    "function": {
      "type": "string",
      "enum": ["count", "avg", "sum", "min", "max"],
      "description": "Aggregation function to apply.",
      "required": true
    },
    "field": {
      "type": "string",
      "description": "Field to aggregate. Use '*' for count.",
      "required": true
    },
    "distinct": {
      "type": "boolean",
      "description": "Apply DISTINCT before aggregating.",
      "default": false
    },
    "filter": {
      "type": "string",
      "description": "OData filter applied before aggregating (WHERE). Example: 'unitPrice lt 10'",
      "default": ""
    },
    "groupby": {
      "type": "array",
      "items": { "type": "string" },
      "description": "Fields to group by, e.g., ['category', 'region']. Grouped field values are included in the response.",
      "default": []
    },
    "orderby": {
      "type": "string",
      "enum": ["asc", "desc"],
      "description": "Sort aggregated results by the computed value. Only applies with groupby.",
      "default": "desc"
    },
    "having": {
      "type": "object",
      "description": "Filter applied after aggregating on the result (HAVING). Operators are AND-ed together.",
      "properties": {
        "eq":  { "type": "number", "description": "Aggregated value equals." },
        "neq": { "type": "number", "description": "Aggregated value not equals." },
        "gt":  { "type": "number", "description": "Aggregated value greater than." },
        "gte": { "type": "number", "description": "Aggregated value greater than or equal." },
        "lt":  { "type": "number", "description": "Aggregated value less than." },
        "lte": { "type": "number", "description": "Aggregated value less than or equal." },
        "in":  {
          "type": "array",
          "items": { "type": "number" },
          "description": "Aggregated value is in the given list."
        }
      }
    }
  },
  "required": ["entity", "function", "field"]
}

Response Alias Convention

The aggregated value in the response is always aliased as {function}_{field}. For count with "*", the alias is count.

Examples

Q1: "How many products are there?"

{
  "entity": "Product",
  "function": "count",
  "field": "*"
}
SELECT COUNT(*) AS count
FROM Product;

Example output:

count
77

Q2: "What is the average price of products under $10?"

{
  "entity": "Product",
  "function": "avg",
  "field": "unitPrice",
  "filter": "unitPrice lt 10"
}
SELECT AVG(unitPrice) AS avg_unitPrice
FROM Product
WHERE unitPrice < 10;

Example output:

avg_unitPrice
6.74

Q3: "Which categories have more than 20 products?"

{
  "entity": "Product",
  "function": "count",
  "field": "*",
  "groupby": ["categoryName"],
  "having": {
    "gt": 20
  }
}
SELECT categoryName, COUNT(*) AS count
FROM Product
GROUP BY categoryName
HAVING COUNT(*) > 20;

Example output:

categoryName count
Beverages 24
Condiments 22

Q4: "For discontinued products, which categories have a total revenue between $500 and $10,000?"

{
  "entity": "Product",
  "function": "sum",
  "field": "unitPrice",
  "filter": "discontinued eq true",
  "groupby": ["categoryName"],
  "having": {
    "gte": 500,
    "lte": 10000
  }
}
SELECT categoryName, SUM(unitPrice) AS sum_unitPrice
FROM Product
WHERE discontinued = 1
GROUP BY categoryName
HAVING SUM(unitPrice) >= 500
   AND SUM(unitPrice) <= 10000;

Example output:

categoryName sum_unitPrice
Seafood 1834.50
Produce 742.00

Q5: "How many distinct suppliers do we have?"

{
  "entity": "Product",
  "function": "count",
  "field": "supplierId",
  "distinct": true
}
SELECT COUNT(DISTINCT supplierId) AS count_supplierId
FROM Product;

Example output:

count_supplierId
29

Q6: "Which categories have exactly 5 or 10 products?"

{
  "entity": "Product",
  "function": "count",
  "field": "*",
  "groupby": ["categoryName"],
  "having": {
    "in": [5, 10]
  }
}
SELECT c...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes Azure/data-api-builder#3178

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
@JerryNixon
Copy link
Contributor

JerryNixon commented Mar 2, 2026

@copilot The implementation should include support for $first and $after per the updated spec. Here is a copy (not the original) of that spec in case you cannot navigate to the source issue for this PR. Please ensure it is implemented.

What?

Allow models to answer: "How many products are there?" and "What is our most expensive product?"

Why?

These are among the most common information discovery questions, a primary model use case.

How?

Introduce a new tool: aggregate_records that reuses native GraphQL aggregation capabilities in DAB.

Schema

{
  "type": "object",
  "properties": {
    "entity": {
      "type": "string",
      "description": "Entity name with READ permission.",
      "required": true
    },
    "function": {
      "type": "string",
      "enum": ["count", "avg", "sum", "min", "max"],
      "description": "Aggregation function to apply.",
      "required": true
    },
    "field": {
      "type": "string",
      "description": "Field to aggregate. Use '*' for count.",
      "required": true
    },
    "distinct": {
      "type": "boolean",
      "description": "Apply DISTINCT before aggregating.",
      "default": false
    },
    "filter": {
      "type": "string",
      "description": "OData filter applied before aggregating (WHERE). Example: 'unitPrice lt 10'",
      "default": ""
    },
    "groupby": {
      "type": "array",
      "items": { "type": "string" },
      "description": "Fields to group by, e.g., ['category', 'region']. Grouped field values are included in the response.",
      "default": []
    },
    "orderby": {
      "type": "string",
      "enum": ["asc", "desc"],
      "description": "Sort aggregated results by the computed value. Only applies with groupby.",
      "default": "desc"
    },
    "having": {
      "type": "object",
      "description": "Filter applied after aggregating on the result (HAVING). Operators are AND-ed together.",
      "properties": {
        "eq":  { "type": "number", "description": "Aggregated value equals." },
        "neq": { "type": "number", "description": "Aggregated value not equals." },
        "gt":  { "type": "number", "description": "Aggregated value greater than." },
        "gte": { "type": "number", "description": "Aggregated value greater than or equal." },
        "lt":  { "type": "number", "description": "Aggregated value less than." },
        "lte": { "type": "number", "description": "Aggregated value less than or equal." },
        "in":  {
          "type": "array",
          "items": { "type": "number" },
          "description": "Aggregated value is in the given list."
        }
      }
    },
    "first": {
      "type": "integer",
      "description": "Maximum number of results to return. Used for pagination. Only applies with groupby.",
      "minimum": 1,
      "default": null
    },
    "after": {
      "type": "string",
      "description": "Cursor for pagination. Returns results after this cursor. Only applies with groupby and first.",
      "default": null
    }
  },
  "required": ["entity", "function", "field"]
}
  • first: Only applies when groupby is used (since non-grouped aggregations return a single value)
  • after: Requires both groupby and first to be set
  • Cursor format: The cursor value should be an opaque string returned by DAB in the response (typically base64-encoded)

Response Alias Convention

The aggregated value in the response is always aliased as {function}_{field}. For count with "*", the alias is count.

Examples

Example 1: "How many products are there?"

{
  "entity": "Product",
  "function": "count",
  "field": "*"
}
SELECT COUNT(*) AS count
FROM Product;

Example output:

count
77

Example 2: "What is the average price of products under $10?"

{
  "entity": "Product",
  "function": "avg",
  "field": "unitPrice",
  "filter": "unitPrice lt 10"
}
SELECT AVG(unitPrice) AS avg_unitPrice
FROM Product
WHERE unitPrice < 10;

Example output:

avg_unitPrice
6.74

Example 3: "Which categories have more than 20 products?"

{
  "entity": "Product",
  "function": "count",
  "field": "*",
  "groupby": ["categoryName"],
  "having": {
    "gt": 20
  }
}
SELECT categoryName, COUNT(*) AS count
FROM Product
GROUP BY categoryName
HAVING COUNT(*) > 20;

Example output:

categoryName count
Beverages 24
Condiments 22

Example 4: "For discontinued products, which categories have a total revenue between $500 and $10,000?"

{
  "entity": "Product",
  "function": "sum",
  "field": "unitPrice",
  "filter": "discontinued eq true",
  "groupby": ["categoryName"],
  "having": {
    "gte": 500,
    "lte": 10000
  }
}
SELECT categoryName, SUM(unitPrice) AS sum_unitPrice
FROM Product
WHERE discontinued = 1
GROUP BY categoryName
HAVING SUM(unitPrice) >= 500
   AND SUM(unitPrice) <= 10000;

Example output:

categoryName sum_unitPrice
Seafood 1834.50
Produce 742.00

Example 5: "How many distinct suppliers do we have?"

{
  "entity": "Product",
  "function": "count",
  "field": "supplierId",
  "distinct": true
}
SELECT COUNT(DISTINCT supplierId) AS count_supplierId
FROM Product;

Example output:

count_supplierId
29

Example 6: "Which categories have exactly 5 or 10 products?"

{
  "entity": "Product",
  "function": "count",
  "field": "*",
  "groupby": ["categoryName"],
  "having": {
    "in": [5, 10]
  }
}
SELECT categoryName, COUNT(*) AS count
FROM Product
GROUP BY categoryName
HAVING COUNT(*) IN (5, 10);

Example output:

categoryName count
Grains 5
Produce 5

Example 7: "What is the average distinct unit price per category, for categories averaging over $25?"

{
  "entity": "Product",
  "function": "avg",
  "field": "unitPrice",
  "distinct": true,
  "groupby": ["categoryName"],
  "having": {
    "gt": 25
  }
}
SELECT categoryName, AVG(DISTINCT unitPrice) AS avg_unitPrice
FROM Product
GROUP BY categoryName
HAVING AVG(DISTINCT unitPrice) > 25;

Example output:

categoryName avg_unitPrice
Meat/Poultry 54.01
Beverages 32.50

Example 8: "Which categories have the most products?"

{
  "entity": "Product",
  "function": "count",
  "field": "*",
  "groupby": ["categoryName"],
  "orderby": "desc"
}
SELECT categoryName, COUNT(*) AS count
FROM Product
GROUP BY categoryName
ORDER BY COUNT(*) DESC;

Example output:

categoryName count
Confections 13
Beverages 12
Condiments 12
Seafood 12

Example 9: "What are the cheapest categories by average price?"

{
  "entity": "Product",
  "function": "avg",
  "field": "unitPrice",
  "groupby": ["categoryName"],
  "orderby": "asc"
}
SELECT categoryName, AVG(unitPrice) AS avg_unitPrice
FROM Product
GROUP BY categoryName
ORDER BY AVG(unitPrice) ASC;

Example output:

categoryName avg_unitPrice
Grains/Cereals 20.25
Condiments 23.06
Produce 32.37

Example 10: "For categories with over $500 revenue from discontinued products, which has the highest total?"

{
  "entity": "Product",
  "function": "sum",
  "field": "unitPrice",
  "filter": "discontinued eq true",
  "groupby": ["categoryName"],
  "having": {
    "gt": 500
  },
  "orderby": "desc"
}

Example 11: "Show me the first 5 categories by product count"

{
  "entity": "Product",
  "function": "count",
  "field": "*",
  "groupby": ["categoryName"],
  "orderby": "desc",
  "first": 5
}
SELECT categoryName, COUNT(*) AS count
FROM Product
GROUP BY categoryName
ORDER BY COUNT(*) DESC
LIMIT 5;

Example output:

categoryName count
Confections 13
Beverages 12
Condiments 12
Seafood 12
Dairy 10

Response includes cursor:

{
  "items": [
    { "categoryName": "Confections", "count": 13 },
    { "categoryName": "Beverages", "count": 12 },
    { "categoryName": "Condiments", "count": 12 },
    { "categoryName": "Seafood", "count": 12 },
    { "categoryName": "Dairy", "count": 10 }
  ],
  "endCursor": "eyJjYXRlZ29yeU5hbWUiOiJEYWlyeSJ9",
  "hasNextPage": true
}

Example 12: "Show me the next 5 categories"

{
  "entity": "Product",
  "function": "count",
  "field": "*",
  "groupby": ["categoryName"],
  "orderby": "desc",
  "first": 5,
  "after": "eyJjYXRlZ29yeU5hbWUiOiJEYWlyeSJ9"
}
SELECT categoryName, COUNT(*) AS count
FROM Product
GROUP BY categoryName
ORDER BY COUNT(*) DESC
LIMIT 5 OFFSET 5;

Example output:

categoryName count
Grains/Cereals 7
Meat/Poultry 6
Produce 5

Response includes cursor:

{
  "items": [
    { "categoryName": "Grains/Cereals", "count": 7 },
    { "categoryName": "Meat/Poultry", "count": 6 },
    { "categoryName": "Produce", "count": 5 }
  ],
  "endCursor": "eyJjYXRlZ29yeU5hbWUiOiJQcm9kdWNlIn0=",
  "hasNextPage": false
}

Example 13: "Show me the top 3 most expensive categories by average price"

{
  "entity": "Product",
  "function": "avg",
  "field": "unitPrice",
  "groupby": ["categoryName"],
  "orderby": "desc",
  "first": 3
}
SELECT categoryName, AVG(unitPrice) AS avg_unitPrice
FROM Product
GROUP BY categoryName
ORDER BY AVG(unitPrice) DESC
LIMIT 3;

Example output:

categoryName avg_unitPrice
Meat/Poultry 54.01
Seafood 37.08
Beverages 37.98
SELECT categoryName, SUM(unitPrice) AS sum_unitPrice
FROM Product
WHERE discontinued = 1
GROUP BY categoryName
HAVING SUM(unitPrice) > 500
ORDER BY SUM(unitPrice) DESC;

Example output:

categoryName sum_unitPrice
Seafood 1834.50
Meat/Poultry 1062.50
Produce 742.00

Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Copy link
Contributor

@souvikghosh04 souvikghosh04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet to complete the review but leaving a note here- main concern is about the performance of aggregation. we should consider testing this with a large dataset and capture some insights.

Comment on lines +311 to +312
List<Dictionary<string, object?>> aggregatedResults = PerformAggregation(
records, function, field, distinct, groupby, havingOps, havingIn, orderby, alias);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the aggregation seems to happen in-memory, once all records are fetched. this is fine but for very large datasets, we should consider moving the aggregation logic inside database. this might require some changes in the query executor.

@souvikghosh04 souvikghosh04 added 2.0 and removed 2.1 labels Mar 2, 2026
@souvikghosh04 souvikghosh04 moved this from Todo to In Progress in Data API builder Mar 2, 2026
@souvikghosh04 souvikghosh04 linked an issue Mar 2, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits March 2, 2026 17:29
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 2, 2026

@JerryNixon The model claude-opus-4.6. is not available for your account. This can happen if the model was disabled by your organization's policy or if your Copilot plan doesn't include access to it.

You can try again without specifying a model (just @copilot) to use the default, or choose a different model from the model picker.

If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: cd710c13-8f48-4b5d-9299-957542329ac0

Copy link
Contributor

@souvikghosh04 souvikghosh04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting additional comments

Comment on lines +487 to +489
if (groupbyMapping.Count > 0)
{
string direction = orderby.Equals("asc", StringComparison.OrdinalIgnoreCase) ? "ASC" : "DESC";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does OrderBy work if GroupBy ifgroupbyMapping.Count is 0? it seems OrderBy only works in combination with GroupBy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please take a look at this and implement the fix if needed. respond back once done

Comment on lines +496 to +497
if (first.HasValue)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to OrderBy, first also seems to be nested inside groupbyMapping.Count > 0 and seems to only work in combination with GroupBy. So without GroupBy first will not be applied?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please take a look at this and implement the fix if needed. respond back once done

}

// Validate query-timeout if provided
if (runtimeConfig.Runtime.Mcp.QueryTimeout is not null && runtimeConfig.Runtime.Mcp.QueryTimeout < 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryTimeout should have an upper bound to prevent acccidentally running it inadequately longer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please take a look at this and implement the fix if needed. respond back once done

@souvikghosh04
Copy link
Contributor

@copilot please go through the review comments and implement the changes.

  • make sure to review each unresolved comments
  • if a fix or change is needed, implement that and respond to the comment appropriately about the fix
  • if something is already fixed or doesn't need a fix or change, respond to the comment with accurate reasoning
  • ensure that functionality is not broken and also ensure there is no deviation from the requirements
  • ensure the changes are made in a single commit in the same PR branch- copilot/add-aggregate-records-tool
  • ensure there are appropriate test coverage
  • ensure that the code is modular and reusable, including the tests and validate end to end for all scenarios
  • for any confusion, deviation with regards to requirements or functionality or additional context needed, please ask the same at the end over a new comment

Co-authored-by: souvikghosh04 <210500244+souvikghosh04@users.noreply.github.com>
@souvikghosh04
Copy link
Contributor

@copilot please try again. please go through the review comments and implement the changes.

  • make sure to review each unresolved comments
  • if a fix or change is needed, implement that and respond to the comment appropriately about the fix
  • if something is already fixed or doesn't need a fix or change, respond to the comment with accurate reasoning
  • ensure that functionality is not broken and also ensure there is no deviation from the requirements
  • ensure the changes are made in a single commit in the same PR branch- copilot/add-aggregate-records-tool
  • ensure there are appropriate test coverage
  • ensure that the code is modular and reusable, including the tests and validate end to end for all scenarios
  • for any confusion, deviation with regards to requirements or functionality or additional context needed, please ask the same at the end over a new comment

@souvikghosh04
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@souvikghosh04
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@souvikghosh04
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@souvikghosh04
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@souvikghosh04
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Review In Progress

Development

Successfully merging this pull request may close these issues.

MCP aggregation logic fixes [Enh]: add aggregate_records DML tool to MCP server

4 participants