Skip to content

Add OpenShift AIOps Self-Healing Platform pattern documentation#652

Open
tosin2013 wants to merge 1 commit intovalidatedpatterns:mainfrom
tosin2013:add-openshift-aiops-platform-docs
Open

Add OpenShift AIOps Self-Healing Platform pattern documentation#652
tosin2013 wants to merge 1 commit intovalidatedpatterns:mainfrom
tosin2013:add-openshift-aiops-platform-docs

Conversation

@tosin2013
Copy link

Summary

This PR adds comprehensive documentation for the OpenShift AIOps Self-Healing Platform validated pattern, a GitOps-deployed solution that provides intelligent, automated incident response for OpenShift clusters.

Documentation includes:

  • Pattern overview and solution elements
  • Architecture documentation with component details (Coordination Engine, MCP Server, KServe models)
  • Deployment instructions for both HA and SNO topologies
  • Customization guide with real-world examples from live production clusters
  • Platform configuration tuning with ConfigMap examples
  • Storage configuration patterns (ODF, CephFS, cloud storage)
  • Resource planning and sizing for HA/SNO deployments
  • Development workflow using Jupyter notebooks
  • Workshop resources and learning materials
  • Cluster sizing guidelines

Real-world examples from live clusters:

  • Multi-model inference serving (anomaly-detector, predictive-analytics)
  • Mixed storage configurations (CephFS for shared workloads, gp3-csi for GPU)
  • Platform configuration tuning via ConfigMap
  • HA cluster resource allocation (7 nodes, ~57 cores, ~185GB RAM)
  • 33 Jupyter notebook catalog covering complete self-healing workflow

Files Added

  • content/patterns/openshift-aiops-platform/_index.adoc - Pattern overview
  • content/patterns/openshift-aiops-platform/getting-started.adoc - Getting started guide
  • content/patterns/openshift-aiops-platform/ideas-for-customization.adoc - Customization guide (1,600+ lines)
  • content/patterns/openshift-aiops-platform/cluster-sizing.adoc - Cluster sizing recommendations
  • modules/oaiops-*.adoc - Architecture, deployment, and solution elements modules
  • Pattern logos and images

Test Plan

  • Hugo site builds successfully (483 pages)
  • No AsciiDoc parsing errors
  • All YAML/Python code examples are properly formatted
  • Cross-references between documentation sections work correctly
  • Real-world examples verified against actual running HA cluster deployment
  • Platform configuration ConfigMap example validated
  • Storage patterns align with ODF/SNO best practices
  • Resource sizing based on actual cluster allocations

Related Links

Comprehensive documentation for the OpenShift AIOps Platform validated pattern,
including deployment guides, architecture overview, customization examples, and
cluster sizing recommendations.

Key additions:
- Pattern overview and solution elements
- Architecture documentation with component details
- Deployment instructions for HA and SNO topologies
- Customization guide with real-world examples from live clusters
- Platform configuration tuning examples
- Storage configuration patterns (ODF, CephFS, cloud storage)
- Resource planning and sizing for HA/SNO deployments
- Development workflow with Jupyter notebooks
- Workshop resources and learning materials
- Cluster sizing guidelines

Includes 33 Jupyter notebook catalog covering anomaly detection, self-healing
logic, model serving, MCP/Lightspeed integration, and advanced scenarios.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 6, 2026

Hi @tosin2013. Thanks for your PR.

I'm waiting for a validatedpatterns member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the size/XXL label Mar 6, 2026
@openshift-ci openshift-ci bot requested review from beekhof and dminnear-rh March 6, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant