fastText provides pre-trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using the fastText library.
fastText distributes pre-trained word vectors for 157 languages, which were trained using the CBOW model with position-weights, a dimension of 300, character n-grams of length 5, a window size of 5, and 10 negatives. These models are based on data from Common Crawl and Wikipedia. Users can download these models directly via command line or Python. The platform also offers a dimension reducer feature, allowing users to adapt the pre-trained 300-dimension vectors to a smaller size, such as 100 dimensions. The word vectors are available in both binary and text formats, supporting operations like finding nearest neighbors and obtaining vectors for out-of-vocabulary words. Tokenization for various languages uses specific segmenters like Stanford for Chinese, Mecab for Japanese, and ICU for others.
Integrations: Python
Platforms: Web, Python
View full fastText Word Vectors profile on Tools-Radar | Browse Coding & Developer Tools tools | Alternatives to fastText Word Vectors
Tools-Radar is a free directory of 10,000+ AI tools — discover, compare, and choose the right AI software for your needs. Visit tools-radar.com