diff --git a/docs/blog/.authors.yml b/docs/blog/.authors.yml index 042012333e..bcad5c6139 100644 --- a/docs/blog/.authors.yml +++ b/docs/blog/.authors.yml @@ -119,3 +119,8 @@ authors: description: European Molecular Biology Laboratory, Germany avatar: https://avatars.githubusercontent.com/u/44709261?v=4 slug: https://github.com/stefanomarangoni495 + admccartney: + name: Adam McCartney + description: Austrian Scientific Computing (ASC), TU Wien + avatar: https://avatars.githubusercontent.com/u/35410331?v=4 + slug: https://github.com/adammccartney diff --git a/docs/blog/posts/2026/03/MUSICA-v2-32-Matthias_Heisler.jpg b/docs/blog/posts/2026/03/MUSICA-v2-32-Matthias_Heisler.jpg new file mode 100644 index 0000000000..c089c86cd8 Binary files /dev/null and b/docs/blog/posts/2026/03/MUSICA-v2-32-Matthias_Heisler.jpg differ diff --git a/docs/blog/posts/2026/03/eessi-musica.md b/docs/blog/posts/2026/03/eessi-musica.md new file mode 100644 index 0000000000..ab5e184768 --- /dev/null +++ b/docs/blog/posts/2026/03/eessi-musica.md @@ -0,0 +1,241 @@ +--- +authors: [admccartney] +date: 2026-03-27 +slug: eessi-musica +--- + +# Choosing EESSI as a base for MUSICA + +
+ ![MUSICA](MUSICA-v2-32-Matthias_Heisler.jpg){width=75%} +
(c) Matthias Heisler 2026
+
+ +MUSICA (Multi-Site Computer Austria) is the latest addition to Austria's +national supercomputing infrastructure. The system's compute resources +are distributed across three locations in Austria: Vienna, Innsbruck, +and Linz. We describe the process that led to the adoption of EESSI +as a base for the software stack on the MUSICA system at the Austrian +Scientific Computing (ASC) research center. + + + +The background section aims to provide a brief history of how cluster +computing at ASC has evolved, with a particular focus on the various +incarnations of the software stack. We outline our motivations for +redesigning a system that delivers the software stack, for initial +use on the MUSICA HPC system. We describe the timeline of events that +lead to the experiments with EESSI and EasyBuild, and offer details of +the two complementary approaches of building a software stack that we +compared. Finally, we offer a critical reflection on our experiments +and outline our ultimate reason for choosing to use EESSI as a base and +blueprint for the software stack. + + +## Background + +The ASC (formerly VSC) is a national center for high performance +computing and research powered by scientific software. The flagship +cluster VSC-1 was in service from 2009-2015, succeeded by a series of +clusters (2-5)[^1]. VSC 4 and 5 are the two clusters that remain in +service as of 2025, they will be joined at the end of the year by a new +cluster MUSICA, which stands for Multi-Site Compute Austria. MUSICA is +a GPU centric cluster run on OpenStack and has so far been the main +testing ground for our initial experiments with EasyBuild and EESSI. + +The management of the software stack at ASC evolved along the following +lines: + ++ VSC 1, 2: Initially catered to small groups of expert users, all + software was installed manually + ++ VSC 3, 4: Still partially managed by hand. A set of scripting tools for + structuring software directory trees. These tools were initially copied + from Innsbruck and adapted to work on the VSC. Use of Tcl modules was also + adopted at this time. + ++ VSC 4, 5: Spack introduced (reduced the need for custom install + scripts, install lots of software quickly, pull in dependencies + automatically) + +## Motivation + +Internal discussions led to a comprehensive understanding of where +the current software stack was lacking and where it would ideally be. +During the discussions, members of the user support team were able +to clearly articulate the various use cases generated by users. This +lead to setting a number of high level goals that were used to derive +requirements. At a very high level, some of the more important goals can +be summarized as: + + - Improved reproducibility and redeployment. + - Establishment of clear release cycles. + - Creation of a more organized and user-friendly representations for the + cluster users. + +We articulated what an ideal software stack should look like, and we +identified a number of issues with the way the software stack was +currently managed. + +### Tooling & Presentation + +The way that we had been using Spack and Tcl Modules had lead to a +fairly unmanageable situation on our clusters. To meet user requests +for software, we adopted a pragmatic approach. This lead to a situation +in which a myriad of software variants were installed into the shared +file system hosting the systems' software. This quickly lead to a +fairly overwhelming presentation of available modules to the user. +Another major issue here was that there were significant issues around +de duplication. We don't know the root cause of this, it may just have +been a misconfigured Spack. In any case, we ended up in an untenable +situation where certain dependencies would get installed many times +over. For example, there were multiple installs of the same OpenMPI +version on the system, all built slightly differently and most untested +on the systems. This meant that there was no way to indicate to the user +which version of a particular software was the one that worked. + +### Build procedure hard to reproduce + +During the last operating system upgrade, the need for a more automated +build process was painfully felt. Because most software was built ad-hoc +in response to user request, sometimes the only record of the build procedure +were the build artefacts themselves. This meant manually going over a very large +software repository and rebuilding everything more or less by hand for the new +operating system. + +### Poor bus factor + +This one refers to the well known metric from software engineering about +the degree of shared knowledge on a specialized domain within the team. +How many people would have to be hit by a bus before the team could +cease to carry out its work? In this particular case, the knowledge about the +software stack was concentrated in one or two individuals. + +## Searching + +As outlined above, the numerous issues with the current stack +established the frame in which to search for a set of tools and methods +to ease the realisation of the high level goals for the software +stack. To reiterate, manageability and user-friendliness were top of the list. + + +### Timeline + +We formed the The Software And Modules (SAM) working group in Q4 2024. +SAM consists of 5 people that are dedicating the majority of their +time to exploring possible alternative ways of building, managing and +presenting the software stack to users. The members draw on expertise +from different areas, notably from their work on the user-support, +sysadmin and platform teams. The goal for the new software stack was to +have it up and running on the new MUSICA system towards the end of 2025. + ++ Summer 2024 + Initial meetings that highlighted the need to reform the management + of software so that it could be easy to use, transparent and logical, + as well as tested and performant. This is the first mention of + EESSI/EasyBuild as possible alternatives to Spack and Lmod as an alternative + to Tcl Modules. + ++ Autumn 2024 + Working group established and a broad set of tools and approaches were + compared. Guix/Nix, Spack, EasyBuild, EESSI, Lmod, and ReFrame These + tools were installed on a number of existing systems and briefly tried + out against a set of high level user requirements that we agreed. + Outcome was to focus on Easybuild and EESSI. + ++ Winter 2024 - Spring 2025 + Made the strategic decision to have EESSI installed on the MUSICA + system. Decided to run a small experiment whereby a small software + stack would be built and installed, in order to compare and contrast + approaches - "EESSI on the side" vs. "EESSI as a base" + + +In June 2024, the system entered a closed test phase, with core software +provided by EESSI. The custom stack will be extended during the course +of the test phase. + +## Experiments + +### Test stack + +The following programs were agreed upon as a way to come in to contact +with specific workflows, such as writing easyconfig files, writing +custom easybuild hooks, installing commercial software, installing gpu +specific application software. + ++ AOCC 5.0.0 ++ Intel Compilers ++ Vasp 6.5.0 ++ 1 Commercial software (starccm, mathematica) ++ NVHPC ++ VASP 6.5.0 GPU ++ Containers (singularity, docker, nvidia) + +### EESSI on the side + +This approach in a sense represents the traditional way to build a +software stack, building everything directly on the host (Rocky9), and +relying on system libraries. It used scripts and wrappers from the sse2 +toolkit from National Supercomputer Centre at Linköping University as +a way to manage and structure the modules and software installations. +The software builds were a mixture of EasyBuild scripts and makefiles. +EESSI was offered as a module in its pure form and in general users were +discouraged from using EESSI-extend, or at their own risk. + +### EESSI as a base + +With this approach, we leveraged EESSI-extend extensively and aimed to +build the whole stack with the compatibility layer from EESSI as a base. +The learning curve for building software more or less moved back and +forth between three distinct phases, leveraging the various possible +settings for the EESSI-extend module. + ++ Phase 0 -> EESSI_USER_INSTALL ++ Phase 1 -> EESSI_SITE_INSTALL ++ Phase 2 -> EESSI_PROJECT_INSTALL=/cvmfs/software.asc.ac.at + + +## Reflections + +### EESSI on the side + +By comparison, it was much quicker and easier to build all the software +in list using this approach. It also offers a lot of control to the +sysadmin who builds the software and doing things like tweaking or +modifying module files in place was possible. The downsides were +reproducibility and portability, there would be obvious work involved +with building the stack again upon the next OS upgrade. That said, +everything worked much more smoothly than with EESSI-extend, it was +possible to build all the software that was listed and run basic tests +with Slurm. We had some open questions around interoperability between +custom modules and EESSI, and whether it would be problematic to mix +modules from the two independent stacks without running into issues +(probably not due to different libc versions). + + +### EESSI as a base + +By the end of the closed test phase of MUSICA, the engineering team +chose EESSI as the foundation for the software stack. While this approach +introduced complexity into our build and installation workflows, it +enabled us to meet certain key requirements for the MUSICA software +infrastructure. + +Specifically, we leveraged CVMFS to distribute the software stack across +the three sites - Vienna, Linz, and Innsbruck. EESSI offers access +to approximately 1960 modules that are ready to load on the target +architecture. Setting up EESSI was quite straight forward, and despite +team members finding the many options of installing with EESSI-extend +module too complex, adopting this method aligned with modern practices +for managing HPC software. EESSI is open source, well documented, and +maintained by colleagues within Europe's HPC ecosystem. + +Engaging with EESSI's documentation, source code, and community proved +valuable. We identified a reusable blueprint that we could adapt to fit +our specific needs. Despite the initial learning curve, this approach +provided long-term benefits in terms of maintainability and scalability. + + +# Footnotes + +[^1]: