mcp: split mz_mcp_data_products into lightweight discovery and detailed views#35747
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
| represents a queryable dataset with a defined schema. | ||
| The `mz_mcp_data_products` view lists data products (indexed materialized views) | ||
| available through the Model Context Protocol (MCP) server that the current user | ||
| has privileges to access. This is a lightweight discovery view — use |
There was a problem hiding this comment.
There's an ambiguity w.r.t. what the "that the current user has privileges to access" is referring to. Did you mean more:
The mz_mcp_data_products view lists data products (i.e., indexed materialized views) that are available through the Model Context Protocol (MCP) server and that the current user can access.
762c2d8 to
06bc662
Compare
There was a problem hiding this comment.
Yeah, unfortunately our RBAC functions are crazy slow. Here specifically pg_has_role is the bottleneck (which comes in twice, once through mz_show_my_object_privileges, and once through mz_show_my_cluster_privileges). There have been some recent discussions in trying to speed it up, because it's also hurting some Console queries: https://materializeinc.slack.com/archives/C06GZ7GBKB5/p1767631065049339?thread_ts=1767080460.197769&cid=C06GZ7GBKB5
If the performance is unacceptable for the MCP stuff, then we could certainly schedule some work on this. By just speeding up pg_has_role by rewriting it in Rust instead of SQL (https://github.com/MaterializeInc/database-issues/issues/11262), we could gain ~2x here (which we can see by temporarily commenting out pg_has_role). Then, to further speed things up, we could work on some optimizer slowness in EquivalencePropagation. There are some ideas here: https://github.com/MaterializeInc/database-issues/issues/8470.
06bc662 to
079272d
Compare
Thanks @ggevay I've created a tracking issue for this: https://linear.app/materializeinc/issue/DEX-19/mcp-improve-mz-mcp-data-products-view-performance-by-addressing-pg-has |
Fixes: https://linear.app/materializeinc/issue/DEX-18/mcp-benchmark-agents-endpoint-and-optimize-mz-mcp-data-products-view
Splitting the
mz_mcp_data_productsinto two views to improve discovery performance:mz_mcp_data_products: just a lightweight view returning just name, cluster, and description (fewer joins, no JSON aggregation)mz_mcp_data_product_details: the full view with JSON Schema column for agents that need column-level detailThe
get_data_productstool now hits the lightweight view, whileget_data_product_detailsqueries the full view. Local benchmarks at 1500 data products show ~25% improvement on the discovery path.Tagging @ggevay in case that you have some suggestions for further optimizations to those 2 builtin views 🙏