In this notebook, we will walk-through a unique example demonstrating how you can apply optimization to assess text dissimilarity. There are numerous potential applications such as in detecting plagiarism, information retrieval, clustering, text categorization, topic detection, question answer session, machine translation and text summarization.

The Word Mover’s Distance (WMD) is a popular measure of text similarity, which measures the semantic distance between two documents. In this notebook, we will achieve two goals:

  • Given two text passages, model WMD as an optimization problem and compute it
  • Examine a plagiarized passage from a book, then find the original passage in that book that has the closest semantic meaning to the given passage

This modeling tutorial is at the introductory level, where we assume that you know Python and that you have a background on a discipline that uses quantitative methods.

You may find it helpful to refer to the documentation of the Gurobi Python API.

Access the Jupyter Notebook Modeling Example

Click on the button below to access the example in Google Colab, which is a free, online Jupyter Notebook environment that allows you to write and execute Python code through your browser. 

How to Run the Jupyter Notebook Modeling Example

  • To run the example the first time, choose “Runtime” and then click “Run all”.
  • All the cells in the Jupyter Notebook will be executed.
  • The example will install the gurobipy package, which includes a limited Gurobi license that allows you to solve small models.
  • You can also modify and re-run individual cells.
  • For subsequent runs, choose “Runtime” and click “on “Restart and run all”.
  • The Gurobi Optimizer will find the optimal solution of the modeling example.

Check out the Colab Getting Started Guide for full details on how to use Colab Notebooks as well as create your own.

Gurobi Newsletter

What's
New at Gurobi

News
Gurobi 10.0 Delivers Blazing-Fast Speed, Innovative Data Science Integration, and an Enterprise Development and Deployment Experience
Latest release enables data professionals to easily integrate machine learning models into optimization models to solve new types of problems.
 Learn More
Event
Webinar: What’s New in Gurobi 10.0
In this webinar, attendees will get a first look at our upcoming product release, Gurobi 10.0. We will summarize the performance improvements and highlight some of the underlying algorithmic advances, such as the network simplex algorithm, enhancements in concurrent LP, and optimization based bound tightening.
 Learn More
new content
Cost Savings & Business Benefits for Gurobi Customers
2022 Total Economic Impact™ Study Reveals A 518% ROI with Gurobi
 Learn More