Skip to content

[proposal] Expose conditional branch upon NZC flags, with *unitary* ptest_ family functions, especially a svcount_t -> bool version #405

@Jerry1144

Description

@Jerry1144

It is encouraged to write loops using svptest_first(svptrue(), pred = svwhilelt())[1], instead of manually comparing i < count as the loop termination condition, as the predicate-generating svwhilelt already sets appropriate registers.

This design unfortunately has two dependencies that are partly broken:

  1. A PTEST exists for any predicate type - there isn't for svcount_t, the predicate-as-counter type.
    Thus there is currently no way to stuff the result of svwhilelt_c32() into a svptest_first(), in order to extend this pattern to the streaming SVE/SME side.

  2. The compiler knows to elide ptest(ptrue, flaggen) into take flags from flaggen(), which doesn't hold true for gcc-15[2]. Nor clang when ptrue's type mismatch with whilelt[3]

Thus I propose a dedicated set of unitary predicate -> bool intrinsic functions svptest_first svptest_last svptest_any, or maybe call them svbif_first, etc. for use here.

This avoids introducing the family of hypothetical bool svptest_first(svcount_t, svcount_t), and overall makes the break-loop line more readable, and less prone to error.


[1] for-loop

void run(uint32_t *input, size_t count) {
svbool_t active;

// Want to write:
for(size_t i = 0; i < count; i += svcntw()) {

    // Generate predicate
    active = svwhilelt_b32_u64(i, count);
    // Load
    svuint32_t vec = svld1(active, &input[i]);
    // Do with the vector
}

// Write this instead:
for(size_t i = 0;
    // Generate active elements and decide whether to move on in one go
    svptest_first(svptrue_b32(),active = svwhilelt_b32_u64(i, count));
    i += svcntw()) {

    // Load
    svuint32_t vec = svld1(active, &input[i]);
    // do with the vector
}
}

which (should) nicely assemble to:

	whilelo	p0.s, xzr, x1
	b.pl	RET                             ; count == 0, we are already done
	mov	x8, #0                          ; i=0
LOOPSTART: 
	ld1w	{ z1.s }, p0/z, [x0, x8, lsl #2]
; // do with the vector
	incw	x8                              ; i += vl
	whilelo	p0.s, x8, x1                    ; compare i < count, sets predicate, sets NZCV flags...
	b.mi	LOOPSTART                       ; for use with this branch
RET:
	ret

[2] gcc

	whilelo	p14.s, xzr, x1
	ptrue	p15.s, all               ; literally svptrue_u32()
	mov	p7.b, p14.b              ; p7 is svbool_t active
	ptest	p15, p14.b               ; literally svptest( , )
	b.nfrst	RET              ; count == 0, early exit

[3] clang, but type mismatch between ptrue and whilelo.

svptest_first(svptrue_b8(),active = svwhilelt_b32_u64(i, count));
                      ^^                      ^^^
	ptrue	p0.b                     ; literally svptrue_u8()
	whilelo	p1.s, xsr, x1
	ptest	p0, p1.b                 ; then literally svptest( , )
	b.pl	RET              ; count == 0, early exit

[4] force stuff svcount into svbool. crashes clang, gcc compiles fine, but the semantics is wrong anyways

svcount_t active;
for (uint_t i;
     svptest_first(svreinterpret_b(svptrue_c32()),svreinterpret_b(active = svwhilelt_c32_u64(i, count, 4));
     i += svcntsw()) {
     //   ^^^^^^^  What's the point of having a dedicated svcntsw() when svcntw() returns streaming VL in streaming mode, anyway?
  // ...
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions