File:Zipf-code-1 English plain, book-coded, Vigenere coded.svg

Original file(SVG file, nominally 512 × 504 pixels, file size: 2.54 MB)

Commons-logo.svg This is a file from the Wikimedia Commons. The description on its description page there is shown below.
Commons is a freely licensed media file repository. You can help.

Summary

Description
English: Zipf law plot (frequency as function of frequency rank) for three versions of the same English text in different encodings.

The original text is H. G. Wells's novel The War of the Worlds (1898), excluding numbers, mapped to lowercase.

The three versions and the respective word frequency files are:

  • Plain (unencoded) text. Sample: no one would have believed in the last years of the nineteenth century [...] there were already a couple of score of passengers aboard some of. File engl/wow/tot.1/gud.wfr (original 60293 words, truncated/filtered to 35027 words, N = 4869 distinct).

English, Text of H. G. Wells's novel The War of the Worlds (1898), mapped to lowercase, excluding numbers.

  • The same text encoded with a 'book code'; specifically, with each distinct word replaced by a different Roman numeral, assigned in order of decreasing frequency. For example, 'that' ⟶ 'xiii', 'his' ⟶ 'lxiv'. The letter 'p' is used as a Roman 'digit' with value 5000. Sample: ccv lii clxix cxxix mdcxxvi xxiv xx dccxii mcmxlix i xx mmmdccclxxxiii [...] mdccclxiii mmmciv cccxxii i. File enrc/wow/tot.1/gud.wfr (original 60293 words, truncated/filtered to 35027 words, N = 4869 distinct).
  • The same text encrypted with a Vigenère cypher with a 27-character alphabet (letters plus apostrophe), preserving spaces, with key 'ferrocyanide'. For example, 'no one would have believed ...' ⟶ 'ss eds yluyl ke'i svzkbvrl ...'. Sample: ss eds yluyl ke'i svzkbvrl lr ylv bouq yriuw tj jys pfnrahisxy tspqudf [...] tumui aihv onoenla e hskfzg lf ekrvj sw foupe'ohvx eseota sauh sk. File envg/wow/tot.1/gud.wfr (original 60293 words, truncated/filtered to 35027 words, N = 12911 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the UNICAMP website. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.
Date
Source Own work
Author Jorge Stolfi

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

Captions

Zipf plot for three version of English: plain , book-coded, and Vigenere coded

Items portrayed in this file

depicts

15 May 2023

image/svg+xml

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeDimensionsUserComment
current17:58, 15 May 2023512 × 504 (2.54 MB)Jorge StolfiUploaded own work with UploadWizard

The following page uses this file:

Metadata