README 1.2 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
  1. This is the raw data for the Wordnet Bahasa, a wordnet for the Malay
  2. languages (currently Malaysian and Indonesian).
  3. For more details see the project page at:
  4. The data is released under the MIT license.
  5. File format:
  6. synset\tlang\tgoodness\tlemma
  7. synset is the offset-pos from Princeton wordnet 3.0
  8. lang
  9. B (Bahasa = msa);
  10. I (Indonesian = ind);
  11. M (Malay = zsm)
  12. goodness is:
  13. Y = hand checked and good
  14. O = automatic high quality (good)
  15. M = automatic medium quality (ok)
  16. L = automatic, probably bad (low)
  17. X = hand checked and bad
  18. Normal release has only Y and O.
  19. e.g.
  20. 00015388-n B X fauna
  21. 00015388-n M Y haiwan
  22. 00015388-n I Y hewan
  23. Note: msa is the supertype of ind and zsm
  24. ========================================================================
  25. Apostrophe should be (’) U+2019 as in: Côte d’Ivoire.
  26. Technically glottal stop should be (ʼ) Letter apostrophe U+02BC.
  27. We need to make the lookup more forgiving of this.
  28. There are some abbreviations in use:
  29. yg = yang
  30. sso =
  31. ========================================================================
  32. Def:
  33. 06822958-n DEF tanda koma di bawah konsonan c tanda bunyi 's'
  34. 06823760-n DEF dua titik di atas huruf vokal