Skip to content

[NFC] Use pascal-style string storage for IString/Name#8662

Merged
kripken merged 25 commits intoWebAssembly:mainfrom
kripken:pascal
May 6, 2026
Merged

[NFC] Use pascal-style string storage for IString/Name#8662
kripken merged 25 commits intoWebAssembly:mainfrom
kripken:pascal

Conversation

@kripken
Copy link
Copy Markdown
Member

@kripken kripken commented Apr 30, 2026

Previously our interned strings were std::string_view, which means a
pointer and a size. Instead, store the size as a header alongside the
string data. As we have many views on the same data, this reduces
memory usage, basically anywhere we use a Name, which is every
Call, GlobalSet, Load, etc.

I see a 5% RAM reduction, and a small improvement in instructions
and branches. Wall time improves by around 1% though that is
close to the noise.

This removes the "reuse" optimization where we could reuse a
string from the input. That was error-prone and a bad idea anyhow.
In practice, it might have helped a little, but this new model is simpler
and saves a lot more. (We can't reuse now since we need to convert
to pascal-style storage anyhow.)

@kripken kripken requested a review from tlively April 30, 2026 16:46
@kripken kripken requested a review from a team as a code owner April 30, 2026 16:46
Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.cpp Outdated
Comment on lines +26 to +28
if (s.data() == nullptr) {
return nullptr;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is not valid for empty string views to have nullptr data pointers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid, though we'd need to be careful with hashing etc. Anyhow, this empty string does not need any storage, so it seems simpler to just skip it (might also be faster but I didn't measure).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if someone tries to intern the string_view {nullptr, 0} and we create a View at nullptr, then trying to turn that View back into a string_view will segfault. That seems bad, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't a std::string_view be null (internal pointer is nullptr)? Where would it segfault?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where it tries to dereference the pointer in the View (actually 4 bytes before it) to get the size.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, View needs to special-case a null to avoid a read. I also see that passing nullptr, 0 is not defined behavior anyhow. Fixed.

Comment thread src/support/istring.cpp
Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.cpp Outdated
Comment thread src/support/istring.cpp
kripken added 3 commits May 4, 2026 16:18
Merge remote-tracking branch 'origin/main' into pascal
Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification!

Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.h
char operator[](int x) const { return str[x]; }

explicit operator bool() const { return str.data() != nullptr; }
explicit operator bool() const { return str.internal != nullptr; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interned empty string views can have str.internal == nullptr, but now that is interpreted as the absence of a name rather than the presence of an empty name. I don't think this is the intended behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty name is non-null internal pointing to some place, and that place has a size of 0? I think this should be ok. Pushed a unit test now, did I do something wrong there?

Comment thread src/support/istring.h Outdated
Comment thread test/gtest/istring.cpp
Comment on lines +24 to +26
auto null = IString();
auto empty = IString("");
EXPECT_NE(null, empty);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need this:

auto default = IString(std::string_view{});

This should be the same as empty, but I think it will currently be the same as null.

Copy link
Copy Markdown
Member Author

@kripken kripken May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why should that be empty? I believe the default std::string_view is a null. The test I just added confirms that unless I'm confused.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well std::string_view{} == std::string_view("") holds, so the interned versions of those should be the same as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, supporting that would add overhead. Do you think it's worth it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any extra overhead except the first time we intern the empty stringref. We just need to remove the early return I commented on here and let the default empty string_view get interned like any other string_view.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe I misunderstood you before then. When you said

Well std::string_view{} == std::string_view("") holds, so the interned versions of those should be the same as well.

Did you mean "IF we interned the null string it should compare equal to the empty string (BUT we don't so things are ok as is?)"

If not, then I am confused. What concrete change to the API are you asking for?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that I added a test early in this thread - is that test wrong in your opinion? What should the test actually check at the API level?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect these test expectations to pass both before and after this refactoring:

ASSERT_EQ(IString(""), IString(std::string_view{})); // !
ASSERT_NE(IString(), IString(""));
ASSERT_NE(IString(), IString(std::string_view{})); // !
ASSERT_FALSE(IString().is());
ASSERT_TRUE(IString("").is());
ASSERT_TRUE(IString(std::string_view{}).is()); // !

My understanding of the current change is that the assertions involving the default string_view will fail. Since this change is supposed to be NFC, this is a bug.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. Ok, I removed that optimization and added tests to verify the behavior you want. I agree it makes sense to keep this 100% NFC, and I don't see an actual speedup from that particular opt.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(change is in e05c2c4 )

@kripken kripken merged commit d2415b6 into WebAssembly:main May 6, 2026
16 checks passed
@kripken kripken deleted the pascal branch May 6, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants