At a high level, bandits are problems which model the explore-exploit tradeoff: Do you exploit the knowledge you have now to gain reward, or explore another option which is potentially worse but potentially better?
As the story goes, the term bandit comes from slot machines. A single-slot machine is a “one-armed bandit” since pulling the arm tends to rob you of your money. The multi-armed bandit arises from a gambler facing a set of slot machines, wondering which arm to pull next.
There is a now a huge number of formulations of the bandit problem, including the multi-armed bandit, the contextual bandit, restless bandits, Kernel bandits, etc. And a tremendous amount of work has gone into each, way more than I dare to summarize here. For intro textbooks see Lattimore or Slivkins.
