English: Zipf law plot (frequency as function of frequency rank) for the words in two
English books. All words were mapped to lowercase. The texts and the word frequency files are:
- Nicholas Culpeper's herbal medicine handbook The English Physitian (1652); excluding numerals, Latin insertions, marginal notes, verses, titles, etc.. Sample: courteous reader aristotle in his metaphysicks writing of the nature of [...] so will a paper also if it do but touch the water your best way then. File engl/cul/tot.1/gud.wfr (original 122229 words, truncated/filtered to 35027 words, N = 3544 distinct).
- H. G. Wells's novel The War of the Worlds (1898), excluding numerals. Sample: no one would have believed in the last years of the nineteenth century [...] there were already a couple of score of passengers aboard some of. File engl/wow/tot.1/gud.wfr (original 60293 words, truncated/filtered to 35027 words, N = 4869 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the
UNICAMP website. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.