This directory contains redistributable wordnets in further sub-directories.
Structure is:
proj/wn-data-lang.tab synset lemma pairs (see below)
proj/LICENCE original license file (or equivalent)
proj/README Any notes about the conversion
proj/lang2tab.py python script to extract the data
may rely on wordnet version mappings
proj/wn-data-lang.tab.log any notes from the conversion
proj/citation.bib the canonical citation reference(s)
Note that a single directory may have wordnets for multiple languages
wn-data is formatted as follows:
# namelangurllicense
offset-postypelemma
offset-postypelemma
...
name is the name of the project
lang is the iso 3 letter code for the name
url is the url of the project
license is a short name for the license
offset is the Princeton WordNet 3.0 offset 8 digit offset
pos is one of [a,s,v,n,r]
lemma is the lemma (word separator normalized to ' ')
type is the language:relationship (e.g. eng:lemma)
Example:
# Thai tha http://th.asianwordnet.org/ wordnet
13567960-n tha:lemma กระบวนการทรานแอมมิแนชัน
00155298-n tha:lemma การปฏิเสธ
14369530-n tha:lemma ภาวะการหายใจเร็วของทารกแรกเกิด
10850469-n tha:lemma เบธัน
11268326-n tha:lemma เรินต์เกน
This data is formatted by the Open Multilingual Wordnet Project
to be used by NLTK.
Please cite us if you find the aggregation useful (see citation.bib)
and email us if you have any suggestions.
Francis Bond (
[email protected])
https://omwn.org/
2021-12-05
31 languages covered (and we assume you have English):
wn-data-als.tab
wn-data-arb.tab
wn-data-bul.tab
wn-data-cmn.tab
wn-data-dan.tab
wn-data-ell.tab
wn-data-fin.tab
wn-data-fra.tab
wn-data-heb.tab
wn-data-hrv.tab
wn-data-isl.tab
wn-data-ita.tab
wn-data-ita.tab
wn-data-jpn.tab
wn-data-cat.tab
wn-data-eus.tab
wn-data-glg.tab
wn-data-spa.tab
wn-data-ind.tab
wn-data-zsm.tab
wn-data-nld.tab
wn-data-nno.tab
wn-data-nob.tab
wn-data-pol.tab
wn-data-por.tab
wn-data-ron.tab
wn-data-lit.tab
wn-data-slk.tab
wn-data-slv.tab
wn-data-swe.tab
wn-data-tha.tab