Emacs Tokenizer tokenizing CJK words with WinRT API or ICU on all platforms, including Windows, MacOS and Linux.
This crate provides dynamic module which emt.el consumes. Install emt.el first, put the module dynamic lib into emt-lib-path (by default located at ~/.emacs.d/modules/libEMT.{dll,so,etc}).
| Architecture \ OS | Windows | GNU / Linux | MacOS |
|---|---|---|---|
| x86_64 | WinRT, ICU | ICU (70, 74, static) | ICU (static) |
| AArch64 | WinRT, ICU | ICU (70, 74, static) | ICU (static) |
| RISC-V 64 | ICU (70, 74, static) |
Note:
- For Linux user, check the ICU version on your system first. A quick reference is on the table to the right. If I didn't pre-build for your system, please use static version, or build it yourself.
- Not all feature combination is listed above, but most of the users would be content about them. Find all from CI Artifact.
- Find MacOS module with Foundation backend from emt
cargo build --release --no-default-features -F icu_segmenter: ICU4X (static)cargo build --release --no-default-features -F rust_icu_ubrk: ICU4C (system / MSYS2 on Windows)cargo build --release --no-default-features -F windows: WinRTcargo build --release --no-default-features -F windows-icuICU4C (system)
For build dependencies and environment, you may refer to the CI script.
The segmenter language with WinRT API is hardcoded. Users can adjust zh-CN to the favoured language.
WinRT is best for Simplified Chinese users, and ICU is best for Traditional Chinese users.
Testing command:
cargo test --no-default-features -F windows --lib -- --nocapturecargo test --no-default-features -F windows-icu --lib -- --nocapture
| WinRT API | ICU |
|---|---|
| '有|异曲同工|之|妙' | '有异|曲|同工|之|妙' |
| '有|異|曲|同工|之|妙' | '有|異曲同工|之|妙' |
| '丧心病狂|的|异想天开' | '丧心病狂|的|异|想|天|开' |
- Try ICU Backend
- Find out why M-S-{F,B} doesn't select anything
- Link against system icu
- Stop linking against libunwind.dll
- emt.el
- ubolonton/emacs-module-rs I don't use it because of issue, but it helps me learn how Emacs Dynamic Module works, and provides useful functions.
- Article: Writing an Emacs module in Rust