NAME
    WordList - Specification and base class for WordList::*, modules that
    contain word list

VERSION
    This document describes version 0.7.11 of WordList (from Perl
    distribution WordList), released on 2021-09-26.

SYNOPSIS
    Use one of the "WordList::*" modules.

DESCRIPTION
    "WordList::*" modules are modules that contain, well, list of words.
    This module, "WordList", serves as a base class and establishes
    convention for such modules.

    "WordList" is an alternative for Games::Word::Wordlist and
    "Games::Word::Wordlist::*". Its main difference is: "WordList::*"
    wordlists are read-only/immutable and the modules are designed to have
    low startup overhead. This makes them more suitable for use in CLI
    scripts which often only want to pick a word from one or several lists.
    See "DIFFERENCES WITH GAMES::WORD::WORDLIST" for more details.

    Unless you are defining a dynamic wordlist (see below), words (or
    phrases) must be put in "__DATA__" section, one per line. Putting the
    wordlist in the "__DATA__" section relieves perl from having to parse
    the list during the loading of the module. To search for words or
    picking some random words from the list, the module also need not slurp
    the whole list into memory (and will not do so unless explicitly
    instructed).

    You must sort your words ascibetically (or by Unicode code point).
    Sorting makes it more convenient to diff different versions of the
    module, as well as performing binary search. If you have a different
    sort order other than ascibetical, you must set package variable $SORT
    with some true value (say, "frequency").

    There must not be any duplicate entry in the word list.

    Dynamic and non-deterministic wordlist. A dynamic wordlist must set
    package variable $DYNAMIC to either 1 (deterministic) or 2
    (non-deterministic). A dynamic wordlist does not put the wordlist in the
    DATA section; instead, user relies on "first_word()" + "next_word()", or
    "each_word()", or "all_words()" to get the list. A deterministic
    wordlist returns the same list everytime "each_word()" or "all_words()"
    is called. A non-deterministic list can return a different list for a
    different "each_word()" or "all_words()" call. See
    WordListRole::FirstNextResetFromEach,
    WordListRole::EachFromFirstNextReset, WordListRole::FromArray if you
    want to write a dynamic wordlist module. It is possible for a dynamic
    list to return unordered or duplicate entries, but it is not encouraged.

    Parameterized wordlist. When instantiating a wordlist class instance,
    user can pass a list of key-value pairs as parameters. Normally only a
    dynamic wordlist would accept parameters. Parameters are defined in the
    %PARAMS package variable. It is a hash of parameter names as keys and
    parameter specification as values. Parameter specification follows
    function argument metadata specified in Rinci::function.

    Examples. Examples can be specified in @EXAMPLES package variable. The
    structure is similar to Rinci function's "examples" property. For
    example:

     # in lib/WordList/Test/Dynamic/RandomWord/1000.pm

     @EXAMPLES = (
         {
             summary => '1000 random words, each 10 to 15 characters long',
             args => {min_len=>10, max_len=>15},
         }
     );

DIFFERENCES WITH GAMES::WORD::WORDLIST
    Since this is a non-compatible interface from Games::Word::Wordlist, I
    also make some other changes:

    *   Namespace is put outside "Games::"

        Because obviously word lists are not only useful for games.

    *   Namespace is more language-neutral and not English-centric

        English wordlists are put under "WordList::EN::*". Other languages
        have their own subnamespaces, e.g. "WordList::FR::*" or
        "WordList::ID::*". Aside from language subnamespaces, there are also
        other subnamespaces: "WordList::Phrase::$LANG::*",
        "WordList::Password::*", "WordList::Domain::*", "WordList::HTTP::*",
        etc.

    *   Interface is simpler

        This is partly due to the list being read-only. The methods provided
        are just:

        - "pick" (pick one or several random entries, without duplicates or
        with)

        - "word_exists" (check whether a word is in the list)

        - "each_word" (run code for each entry)

        - "all_words" (return all the words in a list)

        A couple of other functions might be added, with careful
        consideration.

    *   More extensions

        Some roles, subclasses, or alternate implementations are provided.
        For example, since most wordlist are alphabetically sorted, a binary
        search can be performed in "word_exists()". There is a role,
        WordListRole::BinarySearch, that does that and can be mixed in. An
        even faster version of "word_exists()" using bloom filter is offered
        by WordListRole::Bloom. A faster version of pick() that does random
        seeking is offered by WordListRole::RandomSeekPick.

SUBCLASSING OR CREATING ROLES
    If you want to get the word list from another filehandle source, e.g. a
    gzipped file, you just need to override "reset_iterator()". Your
    "reset_iterator()" needs to set the 'fh' attribute to the filehandle.
    The default "first_word()" calls "reset_iterator()" and reads a line
    from the filehandle. The default "next_word()" just reads another line
    from the filehandle. "each_word()" is implemented in terms of
    "first_word()" and "next_word()", and "word_exists()", "pick()", and
    "all_words()" are implemented in terms of "each_word()".

METHODS
  new
    Usage:

     $wl = WordList::Module->new([ %params ]);

    Constructor.

  each_word
    Usage:

     $wl->each_word($code)

    Call $code for each word in the list. The code will receive the word as
    its first argument.

    If code return -2 will exit early.

  first_word
    Another way to iterate the word list is by calling "first_word" to get
    the first word, then "next_word" repeatedly until you get "undef".

  next_word
    Get the next word. See "first_word" for more details.

  reset_iterator
    Reset iterator. Basically "first_word" is equivalent to "reset_iterator"
    + "next_word".

  pick
    Usage:

     @words = $wl->pick([ $num=1 [ , $allow_duplicates=0 ] ])

    Examples:

     ($word) = $wl->pick;
     @words  = $wl->pick(3);

    Pick $n (default: 1) random word(s) from the list, without duplicates
    (unless $allow_duplicates is set to true). If there are less then $n
    words in the list and duplicates are not allowed, only that many will be
    returned.

    The algorithm used is from perlfaq ("perldoc -q "random line""), which
    scans the whole list once (a.k.a. each_word() once). The algorithm is
    for returning a single entry and is modified to support returning
    multiple entries.

  word_exists
    Usage:

     $wl->word_exists($word) => bool

    Check whether $word is in the list.

    Algorithm in this implementation is linear scan (O(n)). Check out
    WordListRole::BinarySearch for an O(log n) implementation, or
    WordListRole::Bloom for O(1) implementation.

  all_words
    Usage:

     $wl->all_words() => list

    Return all the words in a list, in order. Note that if wordlist is very
    large you might want to use "each_word" instead to avoid slurping all
    words into memory.

FAQ
  Why does pick() return "1"?
    You probably write this:

     $word = $wl->pick;

    instead of this:

     ($word) = $wl->pick;

    "pick()" returns a list and in scalar context it returns the number of
    elements in the list which is 1. This is a common context trap in Perl.

HOMEPAGE
    Please visit the project's homepage at
    <https://metacpan.org/release/WordList>.

SOURCE
    Source repository is at <https://github.com/perlancar/perl-WordList>.

SEE ALSO
    Related projects: ArrayData, HashData, TableData are newer projects
    inspired by WordList. I plan to publish newer wordlists as
    "ArrayData::*" modules. But WordList will still exist and stabilize its
    API.

    "WordListRole::*" modules.

    "WordList::*" modules.

    CLI's are provided in App::wordlist (wordlist), App::WordListUtils (e.g.
    list-wordlist-modules, etc).

AUTHOR
    perlancar <perlancar@cpan.org>

CONTRIBUTING
    To contribute, you can send patches by email/via RT, or send pull
    requests on GitHub.

    Most of the time, you don't need to build the distribution yourself. You
    can simply modify the code, then test via:

     % prove -l

    If you want to build the distribution (e.g. to try to install it locally
    on your system), you can install Dist::Zilla,
    Dist::Zilla::PluginBundle::Author::PERLANCAR, and sometimes one or two
    other Dist::Zilla plugin and/or Pod::Weaver::Plugin. Any additional
    steps required beyond that are considered a bug and can be reported to
    me.

COPYRIGHT AND LICENSE
    This software is copyright (c) 2021, 2020, 2018, 2017, 2016 by perlancar
    <perlancar@cpan.org>.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.

BUGS
    Please report any bugs or feature requests on the bugtracker website
    <https://rt.cpan.org/Public/Dist/Display.html?Name=WordList>

    When submitting a bug or request, please include a test-file or a patch
    to an existing test-file that illustrates the bug or desired feature.