Skip to content

Add LD-pruned biallelic SNP access method#1130

Open
adilraza99 wants to merge 12 commits intomalariagen:masterfrom
adilraza99:GH1049-add-ld-pruning-support
Open

Add LD-pruned biallelic SNP access method#1130
adilraza99 wants to merge 12 commits intomalariagen:masterfrom
adilraza99:GH1049-add-ld-pruning-support

Conversation

@adilraza99
Copy link
Contributor

Summary

Add support for generating LD-pruned SNP datasets from the existing biallelic SNP call pipeline.

Changes

  • introduce AnophelesLdAnalysis mixin
  • add biallelic_snp_calls_ld_pruned() method
  • add ld_params.py for LD pruning parameters
  • integrate the mixin into AnophelesDataResource
  • add tests for LD pruning behaviour

Implementation

The method builds on top of biallelic_snp_calls():

  1. load SNP dataset using the existing pipeline
  2. compute genotype reference counts
  3. apply LD pruning using allel.locate_unlinked
  4. filter variants using _dask_compress_dataset
  5. return an LD-pruned xarray.Dataset

Notes

  • implementation follows the repository's cooperative mixin architecture
  • returned dataset remains compatible with downstream processing (including PLINK export)
  • LD pruning operates on datasets optionally thinned using the existing n_snps parameter

Tests

Added tests to verify:

  • variant count is reduced after pruning
  • dataset structure remains valid
  • compatibility with downstream workflows
  • threshold sensitivity (stricter threshold retains fewer variants)

closes #1049

@adilraza99 adilraza99 changed the title Add LD-pruned biallelic SNP access methodGh1049 add ld pruning supportAdd LD-pruned biallelic SNP access method Add LD-pruned biallelic SNP access method Mar 15, 2026
@adilraza99
Copy link
Contributor Author

Hey Jon,
I was thinking it might be helpful to include a small example notebook demonstrating how to use the LD-pruned SNP access method introduced in this PR.

Would you be okay with me adding one?
@jonbrenas

@adilraza99
Copy link
Contributor Author

@jonbrenas just checking back on this - would be great to hear your thoughts when you get a chance.

Also linking Tristan’s earlier (#1049 (comment)) here for context.

@adilraza99
Copy link
Contributor Author

adilraza99 commented Mar 19, 2026

Hi @jonbrenas, could you please review this PR? I’ve also added a small example notebook demonstrating the LD-pruned SNP workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding Admixture functionalities to the API

1 participant