You can think of a pretrained transformer architecture (TA) model as sort of an English language expert. But the TA expert doesn't know anything about movies and so you provide additional training to ...