NAME Unicode::Towctrans - Generate small case mapping tables SYNOPSIS gen_wctrans gen_wctrans --safec gen_wctrans --musl gen_wctrans -v 15 gen_wctrans -v 15 --ud UnicodeData.txt.15 --out towctrans-15.h gen_wctrans --lower16 gen_wctrans --fn __towcase gen_wctrans --min-excl 10000 gen_wctrans --bits 18:14:10 gen_wctrans --lower16 gen_wctrans --bsearch gen_wctrans --bsearch-both gen_wctrans --table DESCRIPTION gen_wctrans generates a towctrans.h header file, which is used by "musl" and "safeclib" to generate small and efficient case mapping tables, to build the libc towupper() and towlower() functions and its secure variants towupper_s() and towlower_s(). If the code may run on a system with the turkish or azeri locale, you need to define "-DHAVE_LOCALE_TR" to check for the special turkish i locale and mappings at run-time. If you know that your iswalpha() works correctly (only with musl), then use "--with_iswalpha" to get a lightly faster function. E.g. for benchmarking. With "--lower16" it creates larger and more "casemaps" tables, with less long "casemapl" tables. Thus it finds those ranges earlier, at the cost of more caches misses. Currently "--lower16" is the best performance and size combination. For "--bits" the fastest are 18:14:10 and 12:12:8, the smallest is the default 16:8:8. With "--bsearch" the tolower check is done with a binary search, the toupper check does a linear search without early exit. It needs more space, and its performance is not that good as with "--lower16". With "--bsearch-both" the speed is faster and the size is even bigger, as we have to store the order of the upper maps and pairs also to be able to binary search it. With "--table", the musl-new style, the size is much bigger, as we have to store mappings for all blocks. The lookup is much faster though. Planned also for the multi-byte folding tables for wcsfc_s() for safeclib. As the single-byte "towupper" and "towlower" conversions are meaningless for many multi-byte unicode mappings, those with status F - full folding. Use a full string foldcasing function instead, as safeclib "wcsfc_s", ICU "u_strToUpper" or libunistring "uc_toupper". PERFORMANCE Currently it is still a bit un-optimized, but small and fast enough compared to the other implementations. And esp. correct compared to glibc, which ignores characters from other locales. The bench uses Unicode 10.0 data ("-v 10") so that our tables match the Unicode version compiled into musl-old. Benchmark errors fall into three categories, none of which are bugs in our code: Circled letters 0x24B6-0x24E9 (affects musl-old, 52 diffs) Our code correctly maps these per UnicodeData.txt (e.g. "towupper(0x24D0)=0x24B6"). musl-old does not map them at all. Georgian Mtavruli 0x1C90-0x1CBF (affects musl-new, 96 diffs) These uppercase Georgian letters were added in Unicode 11.0. musl-new includes them, but our Unicode 10.0 bench tables do not, so musl-new reports differences for every Mtavruli codepoint. Post-Unicode-10.0 additions (affects musl-new, 16+ diffs) Additional cased characters introduced after Unicode 10.0 (Osage, Adlam, etc.) are present in musl-new but absent from our Unicode 10.0 tables. glibc errors glibc errors are caused by glibc ignoring cased characters from non-latin locales entirely. make -C examples ./bench my: 552 [us] 100.00 % my_excl: 595 [us] 92.77 % my_low16: 594 [us] 92.93 % my_bits: 571 [us] 96.67 % my_bsearch: 477 [us] 115.72 % my_bsearchb: 556 [us] 99.28 % my_table: 257 [us] 214.79 % musl-new: 209 [us] 264.11 % 9 errors musl-old: 1406 [us] 39.26 % 3 errors glibc: 149 [us] 370.47 % 15 errors wc -c towctrans-*.o 3528 towctrans-my.o 3608 towctrans-myexcl.o 3632 towctrans-mylow16.o 3920 towctrans-mybits.o 3968 towctrans-mybsearch.o 4864 towctrans-mybsearch-both.o 6816 towctrans-mytable.o 6848 towctrans-musl-new.o 3464 towctrans-musl-old.o 97440 towctrans-glibc.o Results with more various "--bits" size combinations. They need just some logical fixups for the 5 errors. "--bits 16:10:8" and "--bits 12:12:8" being the most promising, the best being twice as fast as the default. ./bench-bits.sh 16:8:8: 251 [us] 100.0 % 67 21 142 0 6 16:16:8: 125 [us] 200.8 % 76 12 142 0 6 16:10:8: 119 [us] 210.9 % 67 21 142 0 6 18:14:10: 118 [us] 212.7 % 85 3 142 0 6 5 errors 18:14:8: 138 [us] 181.9 % 85 3 142 0 6 5 errors 18:12:10: 120 [us] 209.2 % 81 7 142 0 6 5 errors 18:12:8: 180 [us] 139.4 % 81 7 142 0 6 5 errors 16:12:6: 193 [us] 130.1 % 67 21 142 0 6 5 errors 16:10:6: 133 [us] 188.7 % 67 21 142 0 6 5 errors 14:10:8: 127 [us] 197.6 % 58 30 142 0 6 5 errors 14:12:6: 135 [us] 185.9 % 54 34 142 0 6 5 errors 12:12:8: 119 [us] 210.9 % 34 54 142 0 6 5 errors 4880 towctrans-bmy.o (16:8:8) 5024 towctrans-bmylow16.o (16:16:8) 5232 towctrans-bmybits.o (16:10:8) 5408 bits-12_12_8.o 5312 bits-14_12_6.o 5312 bits-14_10_8.o 5256 bits-16_10_6.o 5256 bits-16_12_6.o 5176 bits-18_12_8.o 5208 bits-18_12_10.o 5208 bits-18_14_8.o 5240 bits-18_14_10.o INSTALLATION Perl 5.12 or later is required. This module does not need to be installed. Running gen_wctrans is enough. However for full testing and global installation run this: perl Makefile.PL make make test make test-all sudo make install DEPENDENCIES This module requires a UnicodeData.txt file from Unicode Character Database, which is automatically downloaded if missing. AUTHOR Reini Urban Copyright(C) 2026 Reini Urban. All rights reserved COPYRIGHT AND LICENSE This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The generated files are MIT licensed. See the generated files headers. SEE ALSO