Gensim is an open-source library for unsupervised topic modeling, document indexing, retrieval by similarity, and other natural language processing functionalities, using modern statistical machine learning.
Gensim is designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing.
Gensim includes streamed parallelized implementations of fastText,[2] word2vec and doc2vec algorithms,[3] as well as latent semantic analysis (LSA, LSI, SVD), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), tf-idf and random projections.
[5] Gensim library has been used and cited in over 1400 commercial and academic applications as of 2018,[6] in a diverse array of disciplines from medicine to insurance claim analysis to patent search.
[8][9][10] The open source code is developed and hosted on GitHub[11] and a public support forum is maintained on Google Groups[12] and Gitter.