trisquel-icecat/icecat/third_party/rust/icu_segmenter
2025-10-06 02:35:48 -06:00
..
benches icecat: initial release for Trisquel 12.0, Ecne 2025-07-17 09:32:21 -06:00
src icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00
tests icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00
.cargo-checksum.json icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00
Cargo.toml icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00
LICENSE icecat: add release 140.3.1-1gnu1 2025-10-06 02:35:48 -06:00
README.md icecat: initial release for Trisquel 12.0, Ecne 2025-07-17 09:32:21 -06:00

icu_segmenter crates.io

Segment strings by lines, graphemes, words, and sentences.

This module is published as its own crate (icu_segmenter) and as part of the icu crate. See the latter for more details on the ICU4X project.

This module contains segmenter implementation for the following rules.

Examples

Line Break

Find line break opportunities:

use icu::segmenter::LineSegmenter;

let segmenter = LineSegmenter::new_auto();

let breakpoints: Vec<usize> = segmenter
    .segment_str("Hello World. Xin chào thế giới!")
    .collect();
assert_eq!(&breakpoints, &[0, 6, 13, 17, 23, 29, 36]);

See [LineSegmenter] for more examples.

Grapheme Cluster Break

Find all grapheme cluster boundaries:

use icu::segmenter::GraphemeClusterSegmenter;

let segmenter = GraphemeClusterSegmenter::new();

let breakpoints: Vec<usize> = segmenter
    .segment_str("Hello World. Xin chào thế giới!")
    .collect();
assert_eq!(
    &breakpoints,
    &[
        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
        19, 21, 22, 23, 24, 25, 28, 29, 30, 31, 34, 35, 36
    ]
);

See [GraphemeClusterSegmenter] for more examples.

Word Break

Find all word boundaries:

use icu::segmenter::WordSegmenter;

let segmenter = WordSegmenter::new_auto();

let breakpoints: Vec<usize> = segmenter
    .segment_str("Hello World. Xin chào thế giới!")
    .collect();
assert_eq!(
    &breakpoints,
    &[0, 5, 6, 11, 12, 13, 16, 17, 22, 23, 28, 29, 35, 36]
);

See [WordSegmenter] for more examples.

Sentence Break

Segment the string into sentences:

use icu::segmenter::SentenceSegmenter;

let segmenter = SentenceSegmenter::new();

let breakpoints: Vec<usize> = segmenter
    .segment_str("Hello World. Xin chào thế giới!")
    .collect();
assert_eq!(&breakpoints, &[0, 13, 36]);

See [SentenceSegmenter] for more examples.

More Information

For more information on development, authorship, contributing etc. please visit ICU4X home page.