Draft
Conversation
…ilder Build random binary tree that satisfies topological constraints by: 1. Ordering constraint splits smallest-to-largest 2. Assigning each tip to its tightest enclosing split 3. Bottom-up: randomly resolve each split's children into binary subtree 4. Wire root-level items (unconstrained tips + top-level splits) Replaces Wagner fallback (T-214 workaround) for RANDOM_TREE strategy, restoring uniform random topology sampling diversity under constraints. Tests: 916 constraint + 24 random-constrained + 373 simplify/driven pass.
- T-196: cherry-picked NA+IW fix from feature/parallel-temper - T-198-201: closed (PT ruled out; PCSA landed via PR #227) - Removed PT section from to-do.md - Deleted feature/parallel-temper and feature/pt-eval (remote+local) - PT findings preserved in .positai/expertise/pt-evaluation.md
Three issues in impose_one_pass / impose_constraint: 1. Bail-out threshold n_tip/4 was too aggressive. For n_tip=5 the threshold was 1, so any split requiring 2+ moves caused an immediate bail-out before making any repairs. Raised to n_tip. 2. impose_one_pass returned 0 both for 'no violations' and 'bailed out', so impose_constraint couldn't distinguish the two. Now returns -1 on bail-out, allowing the caller to know the repair may be incomplete. 3. Documented the root-child limitation: spr_clip() doesn't fully handle root children, so impose_one_pass skips them. The fuse path's post-repair verification guard (T-214) catches these cases. Tests: 940 constraint + 308 simplify/driven/sector pass.
spr_clip() can't detach root children (root is its own parent, so the bypass logic fails). All search callers skip root children, but impose_constraint needs them for constraint repair. New topology_spr() helper in ts_constraint.cpp handles the root-child case by absorbing the sibling into root and repurposing it as the insertion node. No changes to spr_clip or any search caller. Also removes the build_postorder call that was missing between individual moves within impose_one_pass — each topology_spr is now followed by a postorder rebuild so edge enumeration stays valid. Tests: 942 constraint (90 impose-constraint, incl 2 new root-child tests) + 308 simplify/driven/sector pass.
T-208/T-211: random_constrained_tree() and impose_constraint() fixes
Benchmarked perturbStopFactor across 10 morphobank/inapplicable datasets (23-213 tips). Key findings: - PSF=2 gives 2.4-6.9x speedup on converged searches with zero score loss - Complementary to targetHits: on hard landscapes where few replicates hit the best score, PSF fires first; on easy landscapes targetHits fires first (PSF is irrelevant) - PSF=5 provides smaller speedups and is too conservative for large trees Changed default from 0 (disabled) to 2 in SearchControl() and compat wrapper. Updated docs and fixed timeout test (needs PSF=0 to test the timeout path, not convergence).
T-187: Perturbation-count stopping rule
Infrastructure for indirect scoring optimization:
1. FlatBlock struct (24 bytes/block vs 288 bytes in CharBlock) packs
hot-loop metadata (offset, n_states, active_mask, has_inapplicable)
for cache-friendly access. Populated at build_dataset() time.
2. Flat indirect scoring functions (EW and NA-aware variants) that use
FlatBlock and skip upweight_mask/weight overhead. Available as
fitch_indirect_{bounded,cached}_flat and fitch_na_indirect_
{bounded,cached}_flat. NOT wired into search dispatch — see below.
3. Software prefetch in TBR rerooting inner loop: prefetch vroot_cache
entry 2 iterations ahead. At 180+ tips (vroot_cache ~140 KB, L2),
this hides ~10 cycle L2 latency. Negligible overhead at small sizes
where vroot_cache fits in L1.
Benchmarking notes (Agnarsson 62t, Zhu 75t, Dikow 88t, 10 seeds each):
Flat dispatch (ternary or function pointer) showed no measurable benefit
at these sizes — hardware prefetching of the sequential CharBlock array
is already effective, and the dispatch overhead (extra branch or indirect
call) marginally increases code complexity in the hot path. System-level
timing variance on the test machine is ±15-30%, masking any sub-10% gain.
The flat functions are retained as available infrastructure for large-tree
optimization (180+ tips) where CharBlock cache traffic may become
significant. They can be wired in via function pointers when a 180+ tip
benchmark is available for validation.
All 2877 ts-* tests pass with identical scores.
TNT finds better XPIWE trees than TreeSearch on Vinther2008 (3.79283 vs 3.80000). TreeSearch optimal tree has no characters with both missing data and h>=2, so XPIWE=IW on that tree, but the search should explore differently.
…to WORDLIST Pre-existing issues blocking GHA on cpp-search HEAD. Regenerated via roxygen2::roxygenise(load_code = load_installed).
Hamilton HPC benchmark (mbank_X30754, 180t, EPYC 7702, 5 seeds): - AC=1: 400ms/rep, 40% within-replicate hit rate - AC=3: 1370ms/rep, 21% hit rate, no significant score gain (p>0.5) - AC=0 vs AC=3: also no significant difference AC=1 saves ~1s/rep (~6% of 17s median) with no quality loss. Also adds hamilton-hpc project skill for remote benchmarking docs.
TNT is 32-bit i386 with zero SIMD and 64KB LUT popcount. TreeSearch has ~4x throughput advantage (128-bit SSE2). TNT's 3-5x convergence speed is strategic, not implementation. Added T-249..T-253 investigation tasks to to-do.md.
Same approach as TreeDist::popcnt64: emit popcnt instruction via inline asm on GCC/Clang x86-64, __popcnt64 on MSVC. Software Hamming weight fallback for non-x86-64 platforms. CRAN-compatible (no compile flag change). Old: __builtin_popcountll → compiler emits ~10-instruction shift-mask New: single popcnt instruction (92 occurrences in compiled DLL) Also updated T-251 description to focus on candidates-per-improvement.
The TNT download page labels the Windows build as '[32 bits]'. The ~4x throughput advantage finding applies only to the Windows 32-bit binary and should not be generalized to Hamilton (Linux 64-bit) benchmarks.
… identified Trajectory comparison on 3 gap datasets (Geisler2001, Zhu2013, Wortley2006) at 30s budgets, 3 seeds each. Key findings: 1. Drift consumes 16-23% of wall time but gains <1% of score improvement (405-1498 ms/step vs 0.4-44 ms/step for other phases). 30-170x less efficient than the next-worst phase (ratchet). 2. TNT evaluates 1.5-3.6x more rearrangements/second than TreeSearch despite 32-bit scalar architecture vs TreeSearch's SSE2. Per-evaluation overhead in data structure management negates the SIMD advantage. 3. TNT's xmult does extensive intra-replicate sectorial search (~67% of trajectory entries are SECT), while TreeSearch does one XSS+RSS+CSS pass per outer cycle (6-10% of time). 4. TNT achieves 10-16 steps better per-replicate median scores. Recommendations: eliminate drift from default preset (save ~20% time), increase sectorial search rounds, reduce per-evaluation overhead. Files: bench_trajectory.R (comparison script), trajectory_results.rds (raw data), tnt_trajectory_analysis.md (full write-up).
…eline 5 large-tree datasets (131-206 tips), 3 configs, 2 budgets, 10 seeds = 300 runs. Builds from feature/tbr-batch for pruneReinsertNni parameter.
… decision Earlier comment described Stage 1 benchmark showing -14.7 steps improvement, which was misleading — Stage 4 multi-dataset testing (131-206t) found the per-rep overhead was too high (0 replicates at 206t/60s), so pruneReinsertCycles was set to 0. Clarify the rationale and decision.
F-008: Fix constrained drift constraint staleness (T-279)
F-T-245: TBR 4-wide candidate batching (EW flat path)
Stage 4 results analysed (G-001): syab07205/206t starvation at 60s from full-TBR polish per PR cycle (~7s x 5 = 35s overhead). Agent E implemented pruneReinsertNni fix on feature/tbr-batch; Stage 5 scripts uploaded and submitted to Hamilton (SLURM 16622224, ~4-6h).
…f Stage 5 running
…nt-G state updated
…atch deleted after PR #238 merge
…ignores constraints P3)
Hamilton SLURM 16622421 (7h, EPYC 7702). 5 large-tree datasets (131-206t), 20 seeds, 60/120s budgets, EW scoring. pr_nni: wins 7/10 expected-best conditions. Huge benefit on project3701 (146t, -178 median at 60s). Modest at 173-180t. Slight regression at 206t (+12-34 EB). pr_tbr: harmful (1/9 wins; total starvation at 206t/60s). Decision: not enabled in large preset. Available via SearchControl().
Tier guidance: 5 (smoke), 10 (screening), 20 (comparison), 30 (definitive). Calibrated from T-289f Stage 5 empirical significance results. Cross-reference added to AGENTS.md.
Add ClipOrder enum, TBRPassRecord struct, and per-pass diagnostic counters to tbr_search() (guarded behind TBRParams::diagnostics=true). Add ts_tbr_diagnostics() Rcpp bridge returning per-pass data frame. Add order_clips() helper implementing RANDOM/INV_WEIGHT/TIPS_FIRST/BUCKET strategies (Phase 2 infrastructure, disabled by default). Add diag_clip_ordering.R to characterise baseline behaviour. Diagnostic results (10 seeds × 4 datasets, random Wagner starts): Tip-clip enrichment in productive passes: 0.43–0.76× Tip clips (~51% of all clips) account for only 22–38% of accepted moves. Medium-small clips (2..sqrt(n)) appear most productive. CONCLUSION (Phase 4): the small/tip-first hypothesis is FALSIFIED. All three proposed variants (INV_WEIGHT, TIPS_FIRST, BUCKET) favour tip clips, which are the LEAST productive clip type. Phase 2–3 skipped. Branch will be closed after coordination notes are updated.
Phase 1 diagnostic completed 2026-03-29. Hypothesis falsified: tip clips are UNDER-represented in TBR acceptances (0.43-0.76x enrichment across 4 datasets). Medium-small clips most productive. All three ordering variants (inv-weight, tips-first, bucket) favour tips — counterproductive. Branch feature/weighted-clip-order closed. See completed-tasks.md entry PA-001 and AGENTS.md item 12.
5 datasets (62-180t), 20 seeds, EW/IW10/IW3. IW hypothesis weak signal (closed). Real finding: XSS benefit scales with tree size. At 180t: TAEB delta -6.8 to -9.8 EW steps (12-19% overhead). At ≤88t: zero TAEB benefit. No preset change needed.
Stage 5 benchmark (SLURM 16622483, EPYC 7702, 5 datasets 131-206t, 10 seeds, 60s+120s) showed pr_nni (NNI full-tree polish) fixes the Stage 4 showstopper (0 reps at 206t/60s) while improving 131-180t: project3701 (146t): -178 steps at 60s, -128 at 120s project804 (173t): -9 / -2 steps mbank_X30754(180t): -4 / -7 steps syab07205 (206t): +17.5 at 60s, neutral at 120s Enable in large preset: pruneReinsertCycles=5L, pruneReinsertNni=TRUE. Update AGENTS.md and completed-tasks.md. Results in dev/benchmarks/t289f_pr_nni_polish.csv.
…_search When params.nni_full is true but a ConstraintData is active, guard falls through to TBR (which enforces constraints). One-line change mirroring the nni_wagner guard in ts_driven.cpp. Only affects users who combine pruneReinsertNni=TRUE with topological constraints; no preset does this. Also: S-COORD round 46 (task queue, PR status), to-do cleanup.
Agents now check remote-jobs.md at /assign time (new step 4) for retrievable results before claiming tasks. Prevents SLURM results from being silently lost across conversation boundaries.
C++ instrumentation of tbr_search() with post-acceptance sector-masked TBR on clip subtree. Hit rate ~35% regardless of scoring mode (no IW-specific benefit), but NET HARMFUL: disrupts global TBR trajectory. mbank_X30754 EW: +17 to +34 steps TAEB at 30-120s. Validates existing pipeline design (XSS as separate post-convergence phase). Closed.
Phase 1 (a159311) added diagnostic instrumentation and the TIPS_FIRST, INV_WEIGHT, BUCKET, ANTI_TIP, LARGE_FIRST ordering variants to ts_tbr.cpp. Phase 2 completes the implementation: Bug fix: clip_order was only propagated to the initial TBR and final TBR polish (~10% of replicate time). The ratchet and all sectorial TBR calls defaulted to RANDOM, making the ordering variants effectively inert for the dominant phase (ratchet ~76%). Fix: add clip_order field to RatchetParams and SectorParams, propagate from SearchControl through ts_driven.cpp into every TBR call site in ts_ratchet.cpp and ts_sector.cpp (6 sites + search_sector signature). Empirical validation (5 seeds, 30s, default config): Agnarsson2004 (62t, default preset): TIPS_FIRST -2%, INV_WEIGHT neutral Zhu2013 (75t, thorough preset): TIPS_FIRST +13%, INV_WEIGHT +9% Dikow2009 (88t, thorough preset): TIPS_FIRST +8%, INV_WEIGHT +3% Theoretical model (Poisson bucket, corrected): TIPS_FIRST saves ~48% per productive TBR pass at 88t; practical throughput gain is ~8-13% because null passes (ordering-invariant, exhaust all clips) dilute savings. Benefit is dataset-size dependent: < ~65t: tip enrichment is low (Agnarsson2004: 0.43); TIPS_FIRST neutral 65-120t (thorough): tip enrichment moderate; TIPS_FIRST +8-13% No preset defaults changed yet — pending GHA 10-seed validation. bench_clip_ordering.R contains the full benchmark driver.
The SearchControl.Rd usage section was generated from an old installed build (missing clipOrder and many parameters added since). The codoc check correctly flagged the mismatch. - Added @param clipOrder documentation in R/SearchControl.R - Regenerated man/SearchControl.Rd with correct \usage and \item{clipOrder}
TBR clip-ordering strategy (SearchControl clipOrder)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Manual testing underway; shiny app in particular has some usability issues.