C++ search by ms609 · Pull Request #210 · ms609/TreeSearch

ms609 · 2026-03-19T18:56:59Z

other optimizations + features

Manual testing underway; shiny app in particular has some usability issues.

…rained_tree

…ilder Build random binary tree that satisfies topological constraints by: 1. Ordering constraint splits smallest-to-largest 2. Assigning each tip to its tightest enclosing split 3. Bottom-up: randomly resolve each split's children into binary subtree 4. Wire root-level items (unconstrained tips + top-level splits) Replaces Wagner fallback (T-214 workaround) for RANDOM_TREE strategy, restoring uniform random topology sampling diversity under constraints. Tests: 916 constraint + 24 random-constrained + 373 simplify/driven pass.

- Clarify MaddisonSlatkin() @examples (show logp intermediate) - Point FixedDraws overflow error to StepInformation(approx='mc') - Note MC fallback in MaddisonSlatkin Rd documentation From uncommitted work on feature/madslatkin-profiling (PR #211).

Cherry-picked from feature/parallel-temper (6dc28a2). Replaces static extract_divided_steps() copies with shared extract_char_steps() in TBR, SPR, and drift. NA blocks now use three-pass correction formula instead of raw local_cost. ts_temper.cpp already correct (via PR #227 PCSA merge).

- T-196: cherry-picked NA+IW fix from feature/parallel-temper - T-198-201: closed (PT ruled out; PCSA landed via PR #227) - Removed PT section from to-do.md - Deleted feature/parallel-temper and feature/pt-eval (remote+local) - PT findings preserved in .positai/expertise/pt-evaluation.md

Three issues in impose_one_pass / impose_constraint: 1. Bail-out threshold n_tip/4 was too aggressive. For n_tip=5 the threshold was 1, so any split requiring 2+ moves caused an immediate bail-out before making any repairs. Raised to n_tip. 2. impose_one_pass returned 0 both for 'no violations' and 'bailed out', so impose_constraint couldn't distinguish the two. Now returns -1 on bail-out, allowing the caller to know the repair may be incomplete. 3. Documented the root-child limitation: spr_clip() doesn't fully handle root children, so impose_one_pass skips them. The fuse path's post-repair verification guard (T-214) catches these cases. Tests: 940 constraint + 308 simplify/driven/sector pass.

spr_clip() can't detach root children (root is its own parent, so the bypass logic fails). All search callers skip root children, but impose_constraint needs them for constraint repair. New topology_spr() helper in ts_constraint.cpp handles the root-child case by absorbing the sibling into root and repurposing it as the insertion node. No changes to spr_clip or any search caller. Also removes the build_postorder call that was missing between individual moves within impose_one_pass — each topology_spr is now followed by a postorder rebuild so edge enumeration stays valid. Tests: 942 constraint (90 impose-constraint, incl 2 new root-child tests) + 308 simplify/driven/sector pass.

T-208/T-211: random_constrained_tree() and impose_constraint() fixes

Benchmarked perturbStopFactor across 10 morphobank/inapplicable datasets (23-213 tips). Key findings: - PSF=2 gives 2.4-6.9x speedup on converged searches with zero score loss - Complementary to targetHits: on hard landscapes where few replicates hit the best score, PSF fires first; on easy landscapes targetHits fires first (PSF is irrelevant) - PSF=5 provides smaller speedups and is too conservative for large trees Changed default from 0 (disabled) to 2 in SearchControl() and compat wrapper. Updated docs and fixed timeout test (needs PSF=0 to test the timeout path, not convergence).

T-187: Perturbation-count stopping rule

Infrastructure for indirect scoring optimization: 1. FlatBlock struct (24 bytes/block vs 288 bytes in CharBlock) packs hot-loop metadata (offset, n_states, active_mask, has_inapplicable) for cache-friendly access. Populated at build_dataset() time. 2. Flat indirect scoring functions (EW and NA-aware variants) that use FlatBlock and skip upweight_mask/weight overhead. Available as fitch_indirect_{bounded,cached}_flat and fitch_na_indirect_ {bounded,cached}_flat. NOT wired into search dispatch — see below. 3. Software prefetch in TBR rerooting inner loop: prefetch vroot_cache entry 2 iterations ahead. At 180+ tips (vroot_cache ~140 KB, L2), this hides ~10 cycle L2 latency. Negligible overhead at small sizes where vroot_cache fits in L1. Benchmarking notes (Agnarsson 62t, Zhu 75t, Dikow 88t, 10 seeds each): Flat dispatch (ternary or function pointer) showed no measurable benefit at these sizes — hardware prefetching of the sequential CharBlock array is already effective, and the dispatch overhead (extra branch or indirect call) marginally increases code complexity in the hot path. System-level timing variance on the test machine is ±15-30%, masking any sub-10% gain. The flat functions are retained as available infrastructure for large-tree optimization (180+ tips) where CharBlock cache traffic may become significant. They can be wired in via function pointers when a 180+ tip benchmark is available for validation. All 2877 ts-* tests pass with identical scores.

TNT finds better XPIWE trees than TreeSearch on Vinther2008 (3.79283 vs 3.80000). TreeSearch optimal tree has no characters with both missing data and h>=2, so XPIWE=IW on that tree, but the search should explore differently.

…to WORDLIST Pre-existing issues blocking GHA on cpp-search HEAD. Regenerated via roxygen2::roxygenise(load_code = load_installed).

Hamilton HPC benchmark (mbank_X30754, 180t, EPYC 7702, 5 seeds): - AC=1: 400ms/rep, 40% within-replicate hit rate - AC=3: 1370ms/rep, 21% hit rate, no significant score gain (p>0.5) - AC=0 vs AC=3: also no significant difference AC=1 saves ~1s/rep (~6% of 17s median) with no quality loss. Also adds hamilton-hpc project skill for remote benchmarking docs.

…es, PR triage

TNT is 32-bit i386 with zero SIMD and 64KB LUT popcount. TreeSearch has ~4x throughput advantage (128-bit SSE2). TNT's 3-5x convergence speed is strategic, not implementation. Added T-249..T-253 investigation tasks to to-do.md.

Same approach as TreeDist::popcnt64: emit popcnt instruction via inline asm on GCC/Clang x86-64, __popcnt64 on MSVC. Software Hamming weight fallback for non-x86-64 platforms. CRAN-compatible (no compile flag change). Old: __builtin_popcountll → compiler emits ~10-instruction shift-mask New: single popcnt instruction (92 occurrences in compiled DLL) Also updated T-251 description to focus on candidates-per-improvement.

The TNT download page labels the Windows build as '[32 bits]'. The ~4x throughput advantage finding applies only to the Windows 32-bit binary and should not be generalized to Hamilton (Linux 64-bit) benchmarks.

… identified Trajectory comparison on 3 gap datasets (Geisler2001, Zhu2013, Wortley2006) at 30s budgets, 3 seeds each. Key findings: 1. Drift consumes 16-23% of wall time but gains <1% of score improvement (405-1498 ms/step vs 0.4-44 ms/step for other phases). 30-170x less efficient than the next-worst phase (ratchet). 2. TNT evaluates 1.5-3.6x more rearrangements/second than TreeSearch despite 32-bit scalar architecture vs TreeSearch's SSE2. Per-evaluation overhead in data structure management negates the SIMD advantage. 3. TNT's xmult does extensive intra-replicate sectorial search (~67% of trajectory entries are SECT), while TreeSearch does one XSS+RSS+CSS pass per outer cycle (6-10% of time). 4. TNT achieves 10-16 steps better per-replicate median scores. Recommendations: eliminate drift from default preset (save ~20% time), increase sectorial search rounds, reduce per-evaluation overhead. Files: bench_trajectory.R (comparison script), trajectory_results.rds (raw data), tnt_trajectory_analysis.md (full write-up).

…eline 5 large-tree datasets (131-206 tips), 3 configs, 2 budgets, 10 seeds = 300 runs. Builds from feature/tbr-batch for pruneReinsertNni parameter.

… decision Earlier comment described Stage 1 benchmark showing -14.7 steps improvement, which was misleading — Stage 4 multi-dataset testing (131-206t) found the per-rep overhead was too high (0 replicates at 206t/60s), so pruneReinsertCycles was set to 0. Clarify the rationale and decision.

…nding

F-008: Fix constrained drift constraint staleness (T-279)

F-T-245: TBR 4-wide candidate batching (EW flat path)

Stage 4 results analysed (G-001): syab07205/206t starvation at 60s from full-TBR polish per PR cycle (~7s x 5 = 35s overhead). Agent E implemented pruneReinsertNni fix on feature/tbr-batch; Stage 5 scripts uploaded and submitted to Hamilton (SLURM 16622224, ~4-6h).

…f Stage 5 running

…nt-G state updated

…atch deleted after PR #238 merge

…dings

…ignores constraints P3)

Hamilton SLURM 16622421 (7h, EPYC 7702). 5 large-tree datasets (131-206t), 20 seeds, 60/120s budgets, EW scoring. pr_nni: wins 7/10 expected-best conditions. Huge benefit on project3701 (146t, -178 median at 60s). Modest at 173-180t. Slight regression at 206t (+12-34 EB). pr_tbr: harmful (1/9 wins; total starvation at 206t/60s). Decision: not enabled in large preset. Available via SearchControl().

Tier guidance: 5 (smoke), 10 (screening), 20 (comparison), 30 (definitive). Calibrated from T-289f Stage 5 empirical significance results. Cross-reference added to AGENTS.md.

Add ClipOrder enum, TBRPassRecord struct, and per-pass diagnostic counters to tbr_search() (guarded behind TBRParams::diagnostics=true). Add ts_tbr_diagnostics() Rcpp bridge returning per-pass data frame. Add order_clips() helper implementing RANDOM/INV_WEIGHT/TIPS_FIRST/BUCKET strategies (Phase 2 infrastructure, disabled by default). Add diag_clip_ordering.R to characterise baseline behaviour. Diagnostic results (10 seeds × 4 datasets, random Wagner starts): Tip-clip enrichment in productive passes: 0.43–0.76× Tip clips (~51% of all clips) account for only 22–38% of accepted moves. Medium-small clips (2..sqrt(n)) appear most productive. CONCLUSION (Phase 4): the small/tip-first hypothesis is FALSIFIED. All three proposed variants (INV_WEIGHT, TIPS_FIRST, BUCKET) favour tip clips, which are the LEAST productive clip type. Phase 2–3 skipped. Branch will be closed after coordination notes are updated.

Phase 1 diagnostic completed 2026-03-29. Hypothesis falsified: tip clips are UNDER-represented in TBR acceptances (0.43-0.76x enrichment across 4 datasets). Medium-small clips most productive. All three ordering variants (inv-weight, tips-first, bucket) favour tips — counterproductive. Branch feature/weighted-clip-order closed. See completed-tasks.md entry PA-001 and AGENTS.md item 12.

5 datasets (62-180t), 20 seeds, EW/IW10/IW3. IW hypothesis weak signal (closed). Real finding: XSS benefit scales with tree size. At 180t: TAEB delta -6.8 to -9.8 EW steps (12-19% overhead). At ≤88t: zero TAEB benefit. No preset change needed.

Stage 5 benchmark (SLURM 16622483, EPYC 7702, 5 datasets 131-206t, 10 seeds, 60s+120s) showed pr_nni (NNI full-tree polish) fixes the Stage 4 showstopper (0 reps at 206t/60s) while improving 131-180t: project3701 (146t): -178 steps at 60s, -128 at 120s project804 (173t): -9 / -2 steps mbank_X30754(180t): -4 / -7 steps syab07205 (206t): +17.5 at 60s, neutral at 120s Enable in large preset: pruneReinsertCycles=5L, pruneReinsertNni=TRUE. Update AGENTS.md and completed-tasks.md. Results in dev/benchmarks/t289f_pr_nni_polish.csv.

…_search When params.nni_full is true but a ConstraintData is active, guard falls through to TBR (which enforces constraints). One-line change mirroring the nni_wagner guard in ts_driven.cpp. Only affects users who combine pruneReinsertNni=TRUE with topological constraints; no preset does this. Also: S-COORD round 46 (task queue, PR status), to-do cleanup.

Agents now check remote-jobs.md at /assign time (new step 4) for retrievable results before claiming tasks. Prevents SLURM results from being silently lost across conversation boundaries.

C++ instrumentation of tbr_search() with post-acceptance sector-masked TBR on clip subtree. Hit rate ~35% regardless of scoring mode (no IW-specific benefit), but NET HARMFUL: disrupts global TBR trajectory. mbank_X30754 EW: +17 to +34 steps TAEB at 30-120s. Validates existing pipeline design (XSS as separate post-convergence phase). Closed.

Phase 1 (a159311) added diagnostic instrumentation and the TIPS_FIRST, INV_WEIGHT, BUCKET, ANTI_TIP, LARGE_FIRST ordering variants to ts_tbr.cpp. Phase 2 completes the implementation: Bug fix: clip_order was only propagated to the initial TBR and final TBR polish (~10% of replicate time). The ratchet and all sectorial TBR calls defaulted to RANDOM, making the ordering variants effectively inert for the dominant phase (ratchet ~76%). Fix: add clip_order field to RatchetParams and SectorParams, propagate from SearchControl through ts_driven.cpp into every TBR call site in ts_ratchet.cpp and ts_sector.cpp (6 sites + search_sector signature). Empirical validation (5 seeds, 30s, default config): Agnarsson2004 (62t, default preset): TIPS_FIRST -2%, INV_WEIGHT neutral Zhu2013 (75t, thorough preset): TIPS_FIRST +13%, INV_WEIGHT +9% Dikow2009 (88t, thorough preset): TIPS_FIRST +8%, INV_WEIGHT +3% Theoretical model (Poisson bucket, corrected): TIPS_FIRST saves ~48% per productive TBR pass at 88t; practical throughput gain is ~8-13% because null passes (ordering-invariant, exhaust all clips) dilute savings. Benefit is dataset-size dependent: < ~65t: tip enrichment is low (Agnarsson2004: 0.43); TIPS_FIRST neutral 65-120t (thorough): tip enrichment moderate; TIPS_FIRST +8-13% No preset defaults changed yet — pending GHA 10-seed validation. bench_clip_ordering.R contains the full benchmark driver.

@param

The SearchControl.Rd usage section was generated from an old installed build (missing clipOrder and many parameters added since). The codoc check correctly flagged the mismatch. - Added @param clipOrder documentation in R/SearchControl.R - Regenerated man/SearchControl.Rd with correct \usage and \item{clipOrder}

TBR clip-ordering strategy (SearchControl clipOrder)

ms609 marked this pull request as draft March 25, 2026 14:21

ms609 added 29 commits March 25, 2026 16:38

Merge branch 'cpp-search' into feature/perturb-stop

4cc5014

docs: T-212 test comments — reflect Wagner fallback, not random_const…

61fbd03

…rained_tree

docs: update T-212 test comments for random_constrained_tree

8650522

chore: add PCSA, reconverged, reconverges to WORDLIST

8cd3ac7

Merge branch 'cpp-search' into feature/random-constrained-tree

80f59a4

Merge pull request #229 from ms609/feature/random-constrained-tree

e6ba08c

T-208/T-211: random_constrained_tree() and impose_constraint() fixes

Merge pull request #226 from ms609/feature/perturb-stop

6a817b0

T-187: Perturbation-count stopping rule

Coordination

f907de4

T-247: File XPIWE search quality investigation

a7237be

TNT finds better XPIWE trees than TreeSearch on Vinther2008 (3.79283 vs 3.80000). TreeSearch optimal tree has no characters with both missing data and h>=2, so XPIWE=IW on that tree, but the search should explore differently.

fix: sync Rd docs (ratchetTaper, annealCycles in usage) + add TREE's …

3daa145

…to WORDLIST Pre-existing issues blocking GHA on cpp-search HEAD. Regenerated via roxygen2::roxygenise(load_code = load_installed).

fix: add 'speedup' to WORDLIST

4540eae

T-248 complete; T-243 GHA re-dispatched (spelling fix)

943dd0a

S-COORD round 21: close 4 validated Shiny tasks, update stale GHA not…

28cc2f1

…es, PR triage

chore: agent E progress — S-COORD complete, IDLE

961f7cf

T-250: TNT Fitch kernel disassembly analysis

27a8636

TNT is 32-bit i386 with zero SIMD and 64KB LUT popcount. TreeSearch has ~4x throughput advantage (128-bit SSE2). TNT's 3-5x convergence speed is strategic, not implementation. Added T-249..T-253 investigation tasks to to-do.md.

Set maxReplicates default to 96 (multiple of 48 for parallel efficiency)

13501b1

docs: T-250 scope caveat — Windows TNT is 32-bit, Linux/Mac are 64-bit

42aa9fa

The TNT download page labels the Windows build as '[32 bits]'. The ~4x throughput advantage finding applies only to the Windows 32-bit binary and should not be generalized to Hamilton (Linux 64-bit) benchmarks.

chore: close T-251, update completed-tasks.md

caefa11

ms609 added 30 commits March 28, 2026 17:41

chore(T-289f): Stage 5 benchmark — PR NNI polish vs TBR polish vs bas…

aa3f16e

…eline 5 large-tree datasets (131-206 tips), 3 configs, 2 budgets, 10 seeds = 300 runs. Builds from feature/tbr-batch for pruneReinsertNni parameter.

chore: S-COORD round 44 — T-245 GHA PASS; S-RED focus 29 clean; PR pe…

80ece4f

…nding

chore: T-245 status → PR #238; update S-COORD/S-PR notes

d67bed2

chore: agent-e PARKED — T-289f NNI polish done, awaiting GHA + Hamilton

f6318da

chore: agent-F IDLE after T-245 + spelling fix + S-RED focus 29

f9e59b4

Merge pull request #237 from ms609/feature/drift-constraint-fix

93d000a

F-008: Fix constrained drift constraint staleness (T-279)

Merge pull request #238 from ms609/feature/tbr-batch

7207e0b

F-T-245: TBR 4-wide candidate batching (EW flat path)

chore: S-COORD round 45 — PRs #237+#238 merged; agent-G active; T-289…

5f047c9

…f Stage 5 running

chore: S-RED focus 30 clean (ts_drift + ts_fitch/tbr post-merge); age…

8283afb

…nt-G state updated

fix(T-289f): update Hamilton script to use cpp-search — feature/tbr-b…

2784432

…atch deleted after PR #238 merge

chore: agent-G T-289f diagnosis + T-290c complete; resubmit pending

16842b2

docs: update AGENTS.md wagnerStarts section with T-290c empirical fin…

da8d24e

…dings

chore: S-RED focus 31 — ts_prune_reinsert.cpp; filed G-006 (nni_full …

9e79ec3

…ignores constraints P3)

docs: add seed count benchmarking methodology to strategies.md

7aeff18

Tier guidance: 5 (smoke), 10 (screening), 20 (comparison), 30 (definitive). Calibrated from T-289f Stage 5 empirical significance results. Cross-reference added to AGENTS.md.

chore: add remote-jobs.md for tracking async Hamilton/GHA jobs

589d27d

Agents now check remote-jobs.md at /assign time (new step 4) for retrievable results before claiming tasks. Prevents SLURM results from being silently lost across conversation boundaries.

chore: agent F — F-030 complete (PR #239 clip-ordering phase 2)

3cf476d

Merge branch 'cpp-search' into feature/weighted-clip-order

72fce2e

Merge pull request #239 from ms609/feature/weighted-clip-order

6972444

TBR clip-ordering strategy (SearchControl clipOrder)

chore: agent F — IDLE; TS-WeightClip worktree deregistered

14ff3f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ search#210

C++ search#210
ms609 wants to merge 565 commits intomainfrom
cpp-search

ms609 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ms609 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant