Skip to content

Master-Hash/ewt-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

108 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ewt-rs

Emacs Tokenizer tokenizing CJK words with WinRT API or ICU on all platforms, including Windows, MacOS and Linux.

Installation

This crate provides dynamic module which emt.el consumes. Install emt.el first, put the module dynamic lib into emt-lib-path (by default located at ~/.emacs.d/modules/libEMT.{dll,so,etc}).

Pre-built

Architecture \ OS Windows GNU / Linux MacOS
x86_64 WinRT, ICU ICU (70, 74, static) ICU (static)
AArch64 WinRT, ICU ICU (70, 74, static) ICU (static)
RISC-V 64 ICU (70, 74, static)
Packaging status

Note:

  • For Linux user, check the ICU version on your system first. A quick reference is on the table to the right. If I didn't pre-build for your system, please use static version, or build it yourself.
  • Not all feature combination is listed above, but most of the users would be content about them. Find all from CI Artifact.
  • Find MacOS module with Foundation backend from emt

Manually build

  • cargo build --release --no-default-features -F icu_segmenter: ICU4X (static)
  • cargo build --release --no-default-features -F rust_icu_ubrk: ICU4C (system / MSYS2 on Windows)
  • cargo build --release --no-default-features -F windows: WinRT
  • cargo build --release --no-default-features -F windows-icu ICU4C (system)

For build dependencies and environment, you may refer to the CI script.

Hardcoded

The segmenter language with WinRT API is hardcoded. Users can adjust zh-CN to the favoured language.

WinRT API vs ICU

WinRT is best for Simplified Chinese users, and ICU is best for Traditional Chinese users.

Testing command:

  • cargo test --no-default-features -F windows --lib -- --nocapture
  • cargo test --no-default-features -F windows-icu --lib -- --nocapture
WinRT API ICU
'有|异曲同工|之|妙' '有异|曲|同工|之|妙'
'有|異|曲|同工|之|妙' '有|異曲同工|之|妙'
'丧心病狂|的|异想天开' '丧心病狂|的|异|想|天|开'

Future Work

  • Try ICU Backend
  • Find out why M-S-{F,B} doesn't select anything
  • Link against system icu
  • Stop linking against libunwind.dll

Credit

About

Emacs Tokenizer tokenizing CJK words with WinRT API or ICU.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors