Skip to content

Consolidating OpenACC device-host memory transfers#1315

Merged
mgduda merged 7 commits intoMPAS-Dev:developfrom
abishekg7:atmosphere/acc_mem_move_per_timestep
Mar 13, 2026
Merged

Consolidating OpenACC device-host memory transfers#1315
mgduda merged 7 commits intoMPAS-Dev:developfrom
abishekg7:atmosphere/acc_mem_move_per_timestep

Conversation

@abishekg7
Copy link
Collaborator

@abishekg7 abishekg7 commented May 13, 2025

This PR introduces changes to the MPAS Atmosphere core to consolidate OpenACC host and device data transfers during the course of the dynamical core execution. This commit adds calls to OpenACC device-host memory transfer
subroutines, introduced in previous commits, in order to eliminate extraneous
data transfers in the dynamical core.

Much of the previously distributed data movement statements in mpas_atm_time_integration have been consolidated in two subroutines, mpas_atm_pre_dynamics and mpas_atm_post_dynamics These pair of subroutines are called once per time step in the atmosphere core, right before and after the call to atm_srk3. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core.

The mesh/time-invariant fields are still copied onto the device in mpas_atm_dynamics_init and removed from the device in mpas_atm_dynamics_finalize, with the exception of select fields transferred in the subroutines
mpas_atm_pre_compute_solve_diagnostics and mpas_atm_post_compute_solve_diagnostics. This is a special case due to atm_compute_solve_diagnostics being called for the first time before the call to mpas_atm_dynamics_init.

This PR also invokes host-device data transfer routines in the mpas_atm_iau, mpas_atmphys_interface and mpas_atmphys_todynamics modules to ensure that the code regions performing computations related to IAU, microphysics and physics tendencies, all of which are currently executed on CPUs, are using the most field values from dynamical core running on GPUs, and vice versa.

In addition, this commit also includes explicit data transfers around halo exchanges in the atm_srk3 subroutine.

These subroutines for data routines, and the acc update statements are an interim solution until we have a book-keeping method in place.

This PR also introduces a couple of new timers to keep track of the cost of data transfers.

@abishekg7
Copy link
Collaborator Author

@mgduda I think it might be ready for a second look.

I did try to move the !$acc update statements around halo exchanges to occur within exchange_halo_group, but there are some instances of physics interfaces also calling this routine, which leads to runtime errors. So I've left the !$acc update statements around halo exchanges as is.

@abishekg7 abishekg7 force-pushed the atmosphere/acc_mem_move_per_timestep branch from 31a1ccd to 4b7137d Compare August 14, 2025 16:32
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Feb 6, 2026
@mgduda mgduda self-requested a review February 6, 2026 21:24
@abishekg7 abishekg7 force-pushed the atmosphere/acc_mem_move_per_timestep branch from 5b23998 to 35cc144 Compare February 24, 2026 21:09
@abishekg7 abishekg7 force-pushed the atmosphere/acc_mem_move_per_timestep branch from 35cc144 to 738138a Compare March 3, 2026 21:21
@mgduda mgduda self-requested a review March 5, 2026 22:23
@mgduda mgduda self-requested a review March 6, 2026 22:15
@mgduda mgduda self-requested a review March 6, 2026 23:10
@mgduda
Copy link
Contributor

mgduda commented Mar 6, 2026

@abishekg7 This PR looks in good shape to me, and thanks for addressing all of the comments! I'll run a couple of final tests, but in the meanwhile, please feel free to rework the commit history.

@abishekg7 abishekg7 force-pushed the atmosphere/acc_mem_move_per_timestep branch from e05d652 to 4d2592b Compare March 9, 2026 23:18
@mgduda
Copy link
Contributor

mgduda commented Mar 12, 2026

With an LES test case, I'm getting the following error when attempting to run on GPUs:

Present table errors:
vcell(:,cellstart:cellend) lives at 0x121d1bd0 size 3093600 partially present in
host:0x121d1bd0 device:0x148c36000000 size:2808000 presentcount:0+1 line:1100 name:ureconstructmeridional(:,1:ncells) file:MPAS-Model/src/core_atmosphere/dynamics/mpas_atm_time_integration.F
FATAL ERROR: variable in data clause is partially present on the device: name=vcell(:,cellstart:cellend)
 file:MPAS-Model/src/core_atmosphere/dynamics/mpas_atm_dissipation_models.F les_models line:278

…nfers

This commit introduces a set of routines to mpas_atm_time_integration in order
to begin consolidating OpenACC data transfers between host and device during
the course of the dynamical core execution.

As the atm_compute_solve_diagnostics subroutine also being called once before
the time integration loop, we also introduce a separate pair of subroutines to
handle data movements around the first call to atm_compute_solve_diagnostics.

The mesh/time-invariant fields are still copied onto the device in the call to
mpas_atm_dynamics_init and removed from the device during the call to
mpas_atm_dynamics_finalize, with the exception of certain fields moved in
mpas_atm_pre/post_compute_solve_diagnostics. This is a special case due to
atm_compute_solve_diagnostics being called for the first time before the call
to mpas_atm_dynamics_init.
This commit introduces a set of routines to mpas_atm_iau, building on the
previous commit, to begin consolidating OpenACC data transfers between host and
device during the course of the dynamical core execution. As the IAU code is
currently executed on CPUs, it is necessary to synchronize the fields needed
for this computation with the host before the call to atm_add_tend_anal_incr
and sync back to the device after this call.
…a tranfers

This commit introduces a set of routines to mpas_atmphys_interface, building on
the last two commits, to begin consolidating OpenACC data transfers between
host and device during the course of the dynamical core execution. As the
microphysics is currently executed on CPUs, it is necessary to synchronize the
fields needed for this computation with the host before the call to
microphysics from the dycore and sync back to the device after this call.
…ta tranfers

This commit introduces a set of routines to mpas_atmphys_todynamics, building
on the last several commits, to begin consolidating OpenACC data transfers
between host and device during the course of the dynamical core execution. As
the computation of the physics tendencies is currently executed on CPUs, it is
necessary to synchronize the fields needed for this computation with the host
before the call to physics_get_tend and sync back to the device after this call
…anfers

This commit introduces a set of routines to mpas_vector_reconstruction, on top
of the last several commits, to begin consolidating OpenACC data transfers
between host and device during the course of the dynamical core execution. The
call to mpas_reconstruct_2d is currently executed on device (GPU), and there is
no need for ACC data transfers around this call within the time integration
loop. However, mpas_reconstruct_2d is also invoked once before the start of the
time integration loop and it becomes necessary to synchronize the fields needed
for mpas_reconstruct_2d with the device before this call and sync back to the
host following this call.
This commit introduces changes to the MPAS Atmosphere core to consolidate
OpenACC host and device data transfers during the course of the dynamical core
execution. This commit adds calls to OpenACC device-host memory transfer
subroutines, introduced in previous commits, in order to eliminate extraneous
data transfers in the dynamical core.

Much of the previously distributed data movement statements have been
consolidated in two subroutines, mpas_atm_pre_dynamics and mpas_atm_post_dynamics
These pair of subroutines are called once per timestep in the atmosphere core,
right before and after the call to atm_srk3.

The mesh/time-invariant fields are still copied onto the device in mpas_atm_
dynamics_init and removed from the device in mpas_atm_dynamics_finalize, with
the exception of select fields transferred in the subroutines
mpas_atm_pre_compute_solve_diagnostics and mpas_atm_post_compute_solve_diagnostics
This is a special case due to atm_compute_solve_diagnostics being called for
the first time before the call to mpas_atm_dynamics_init.

This commit also invokes host-device data transfer routines in the mpas_atm_iau,
mpas_atmphys_interface and mpas_atmphys_todynamics modules to ensure that the
code regions performing computations related to IAU, microphysics and physics
tendencies, all of which are currently executed on CPUs, are using the most
field values from dynamical core running on GPUs, and vice versa.

In addition, this commit also includes explicit data transfers around halo
exchanges in the atm_srk3 subroutine.
This commit introduces changes to previously existing timers, and adds new
timers in order to measure the time taken for OpenACC host-device memory
transfers in various code regions after the memory movement consolidation
introduced the previous commit.
@abishekg7 abishekg7 force-pushed the atmosphere/acc_mem_move_per_timestep branch from 6bb56c2 to eabe6d0 Compare March 13, 2026 16:41
@mgduda mgduda merged commit 5269f7f into MPAS-Dev:develop Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Atmosphere OpenACC Work related to OpenACC acceleration of code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants