UTF-8 SAMPLER

  ¥ · £ · € · $ · ¢ · ₡ · ₢ · ₣ · ₤ · ₥ · ₦ · ₧ · ₨ · ₩ · ₪ · ₫ · ₭ · ₮ · ₯ · ₹

Frank da Cruz
The Kermit Project
New York City
fdc@kermitproject.org

Last update: Thu Sep 15 14:00:00 2016


PEACE ] [ Poetry ] [ I Can Eat Glass ] [ Pangrams ] [ HTML Features ] [ Credits, Tools, Commentary ]

UTF-8 is an ASCII-preserving encoding method for Unicode (ISO 10646), the Universal Character Set (UCS). The UCS encodes most of the world's writing systems in a single character set, allowing you to mix languages and scripts within a document without needing any tricks for switching character sets. This web page is encoded directly in UTF-8.

As shown HERE, Columbia University's Kermit 95 terminal emulation software can display UTF-8 plain text in Windows 95, 98, ME, NT, XP, Vista, or Windows 7/8/10 when using a monospace Unicode font like Andale Mono WT J or Everson Mono Terminal, or the lesser populated Courier New, Lucida Console, or Andale Mono. C-Kermit can handle it too, if you have a Unicode display. As many languages as are representable in your font can be seen on the screen at the same time.

This, however, is a Web page, which started out as a kind of stress test for UTF-8 support in Web browsers, which was spotty when this page was first created in the 1990s but which has become standard in all modern browsers. The problem now is mainly the fonts and the browser's (or font's) support for the nonzero Unicode planes (as in, e.g., the Braille and Gothic examples below). And to some extent the rendition of combining sequences, right-to-left rendition (Arabic, Hebrew), and so on. CLICK HERE for a survey of Unicode fonts for Windows.

The subtitle above shows currency symbols of many lands. If they don't appear as blobs, we're off to a good start! (The one on the end is the new Indian Rupee sign which won't show up in fonts for a while.)

Poetry

From the Anglo-Saxon Rune Poem (Rune version):

ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬

From Laȝamon's Brut (The Chronicles of England, Middle English, West Midlands):

An preost wes on leoden, Laȝamon was ihoten
He wes Leovenaðes sone -- liðe him be Drihten.
He wonede at Ernleȝe at æðelen are chirechen,
Uppen Sevarne staþe, sel þar him þuhte,
Onfest Radestone, þer he bock radde.

(The third letter in the author's name is Yogh, missing from many fonts; CLICK HERE for another Middle English sample with some explanation of letters and encoding).

From the Tagelied of Wolfram von Eschenbach (Middle High German):

Sîne klâwen durh die wolken sint geslagen,
er stîget ûf mit grôzer kraft,
ich sih in grâwen tägelîch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen în verliez.
ich bringe in hinnen, ob ich kan.
sîn vil manegiu tugent michz leisten hiez.

Some lines of Odysseus Elytis (Greek):

Monotonic:

Τη γλώσσα μου έδωσαν ελληνική
το σπίτι φτωχικό στις αμμουδιές του Ομήρου.
Μονάχη έγνοια η γλώσσα μου στις αμμουδιές του Ομήρου.

από το Άξιον Εστί
του Οδυσσέα Ελύτη

Polytonic:

Τὴ γλῶσσα μοῦ ἔδωσαν ἑλληνικὴ
τὸ σπίτι φτωχικὸ στὶς ἀμμουδιὲς τοῦ Ὁμήρου.
Μονάχη ἔγνοια ἡ γλῶσσα μου στὶς ἀμμουδιὲς τοῦ Ὁμήρου.

ἀπὸ τὸ Ἄξιον ἐστί
τοῦ Ὀδυσσέα Ἐλύτη

The first stanza of Pushkin's Bronze Horseman (Russian):

На берегу пустынных волн
Стоял он, дум великих полн,
И вдаль глядел. Пред ним широко
Река неслася; бедный чёлн
По ней стремился одиноко.
По мшистым, топким берегам
Чернели избы здесь и там,
Приют убогого чухонца;
И лес, неведомый лучам
В тумане спрятанного солнца,
Кругом шумел.

Šota Rustaveli's Veṗxis Ṭq̇aosani, ̣︡Th, The Knight in the Tiger's Skin (Georgian):

ვეპხის ტყაოსანი შოთა რუსთაველი

ღმერთსი შემვედრე, ნუთუ კვლა დამხსნას სოფლისა შრომასა, ცეცხლს, წყალსა და მიწასა, ჰაერთა თანა მრომასა; მომცნეს ფრთენი და აღვფრინდე, მივჰხვდე მას ჩემსა ნდომასა, დღისით და ღამით ვჰხედვიდე მზისა ელვათა კრთომაასა.

Tamil poetry of Subramaniya Bharathiyar: சுப்ரமணிய பாரதியார் (1882-1921):

யாமறிந்த மொழிகளிலே தமிழ்மொழி போல் இனிதாவது எங்கும் காணோம்,
பாமரராய் விலங்குகளாய், உலகனைத்தும் இகழ்ச்சிசொலப் பான்மை கெட்டு,
நாமமது தமிழரெனக் கொண்டு இங்கு வாழ்ந்திடுதல் நன்றோ? சொல்லீர்!
தேமதுரத் தமிழோசை உலகமெலாம் பரவும்வகை செய்தல் வேண்டும்.

Kannada poetry by Kuvempu — ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು

ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು ಇಂದೆನ್ನ ಹೃದಯದಲಿ
ನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ

ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗೀ...
ಮಣ್ಣಾಗಿ ಮರವಾಗಿ ಮಿಗವಾಗಿ ಕಗವಾಗಿ
ಭವ ಭವದಿ ಭತಿಸಿಹೇ ಭವತಿ ದೂರ
ನಿತ್ಯವೂ ಅವತರಿಪ ಸತ್ಯಾವತಾರ || ಬಾ ಇಲ್ಲಿ ||

I Can Eat Glass

And from the sublime to the ridiculous, here is a certain phrase¹ in an assortment of languages:

  1. Sanskrit: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥
  2. Sanskrit (standard transcription): kācaṃ śaknomyattum; nopahinasti mām.
  3. Classical Greek: ὕαλον ϕαγεῖν δύναμαι· τοῦτο οὔ με βλάπτει.
  4. Greek (monotonic): Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
  5. Greek (polytonic): Μπορῶ νὰ φάω σπασμένα γυαλιὰ χωρὶς νὰ πάθω τίποτα.
    Etruscan: (NEEDED)
  6. Latin: Vitrum edere possum; mihi non nocet.
  7. Old French: Je puis mangier del voirre. Ne me nuit.
  8. French: Je peux manger du verre, ça ne me fait pas mal.
  9. Provençal / Occitan: Pòdi manjar de veire, me nafrariá pas.
  10. Québécois: J'peux manger d'la vitre, ça m'fa pas mal.
  11. Walloon: Dji pou magnî do vêre, çoula m' freut nén må.
    Champenois: (NEEDED)
    Lorrain: (NEEDED)
  12. Picard: Ch'peux mingi du verre, cha m'foé mie n'ma.
    Corsican/Corsu: (NEEDED)
    Jèrriais: (NEEDED)
  13. Kreyòl Ayisyen (Haitï): Mwen kap manje vè, li pa blese'm.
  14. Basque: Kristala jan dezaket, ez dit minik ematen.
  15. Catalan / Català: Puc menjar vidre, que no em fa mal.
  16. Spanish: Puedo comer vidrio, no me hace daño.
  17. Aragonés: Puedo minchar beire, no me'n fa mal .
    Aranés: (NEEDED)
    Mallorquín: (NEEDED)
  18. Galician: Eu podo xantar cristais e non cortarme.
  19. European Portuguese: Posso comer vidro, não me faz mal.
  20. Brazilian Portuguese (8): Posso comer vidro, não me machuca.
  21. Caboverdiano/Kabuverdianu (Cape Verde): M' podê cumê vidru, ca ta maguâ-m'.
  22. Papiamentu: Ami por kome glas anto e no ta hasimi daño.
  23. Italian: Posso mangiare il vetro e non mi fa male.
  24. Milanese: Sôn bôn de magnà el véder, el me fa minga mal.
  25. Roman: Me posso magna' er vetro, e nun me fa male.
  26. Napoletano: M' pozz magna' o'vetr, e nun m' fa mal.
  27. Venetian: Mi posso magnare el vetro, no'l me fa mae.
  28. Zeneise (Genovese): Pòsso mangiâ o veddro e o no me fà mâ.
  29. Sicilian: Puotsu mangiari u vitru, nun mi fa mali.
    Campinadese (Sardinia): (NEEDED)
    Lugudorese (Sardinia): (NEEDED)
  30. Romansch (Grischun): Jau sai mangiar vaider, senza che quai fa donn a mai.
    Romany / Tsigane: (NEEDED)
  31. Romanian: Pot să mănânc sticlă și ea nu mă rănește.
  32. Esperanto: Mi povas manĝi vitron, ĝi ne damaĝas min.
    Pictish: (NEEDED)
    Breton: (NEEDED)
  33. Cornish: Mý a yl dybry gwéder hag éf ny wra ow ankenya.
  34. Welsh: Dw i'n gallu bwyta gwydr, 'dyw e ddim yn gwneud dolur i mi.
  35. Manx Gaelic: Foddym gee glonney agh cha jean eh gortaghey mee.
  36. Old Irish (Ogham): ᚛᚛ᚉᚑᚅᚔᚉᚉᚔᚋ ᚔᚈᚔ ᚍᚂᚐᚅᚑ ᚅᚔᚋᚌᚓᚅᚐ᚜
  37. Old Irish (Latin): Con·iccim ithi nglano. Ním·géna.
  38. Irish: Is féidir liom gloinne a ithe. Ní dhéanann sí dochar ar bith dom.
  39. Ulster Gaelic: Ithim-sa gloine agus ní miste damh é.
  40. Scottish Gaelic: S urrainn dhomh gloinne ithe; cha ghoirtich i mi.
  41. Anglo-Saxon (Runes): ᛁᚳ᛫ᛗᚨᚷ᛫ᚷᛚᚨᛋ᛫ᛖᚩᛏᚪᚾ᛫ᚩᚾᛞ᛫ᚻᛁᛏ᛫ᚾᛖ᛫ᚻᛖᚪᚱᛗᛁᚪᚧ᛫ᛗᛖ᛬
  42. Anglo-Saxon (Latin): Ic mæg glæs eotan ond hit ne hearmiað me.
  43. Middle English: Ich canne glas eten and hit hirtiþ me nouȝt.
  44. English: I can eat glass and it doesn't hurt me.
  45. English (IPA): [aɪ kæn iːt glɑːs ænd ɪt dɐz nɒt hɜːt miː] (Received Pronunciation)
  46. English (Braille): ⠊⠀⠉⠁⠝⠀⠑⠁⠞⠀⠛⠇⠁⠎⠎⠀⠁⠝⠙⠀⠊⠞⠀⠙⠕⠑⠎⠝⠞⠀⠓⠥⠗⠞⠀⠍⠑
  47. Jamaican: Mi kian niam glas han i neba hot mi.
  48. Lalland Scots / Doric: Ah can eat gless, it disnae hurt us.
    Glaswegian: (NEEDED)
  49. Gothic (4):