Section: New Results
Early machine learning approaches to coreference resolution rely on local, discriminative pairwise classifiers  ,  ,  made considerable progress in creating robust coreference systems, but their performance still left much room for improvement. This stems from two main deficiencies:
Decision locality. Decisions are made independently of others; a separate clustering step forms chains from pairwise classifications. But, coreference clearly should be conditioned on properties of an entity as a whole.
Knowledge bottlenecks. Coreference involves many different factors, e.g., morphosyntax, discourse structure and reasoning. Yet most systems rely on small sets of shallow features. Accurately predicting such information and using it to constrain coreference is difficult, so its potential benefits often go unrealized due to error propagation.
More recent work has sought to address these limitations. For example, to address decision locality, McCallum and Wellner  use conditional random fields with model structures in which pairwise decisions influence others. Denis  and Klenner  use integer linear programming (ilp ) to perform global inference via transitivity constraints between different coreference decisions. Denis and Baldridge  use a ranker to compare antecedents for an anaphor simultaneously rather than in the standard pairwise manner. To address the knowledge bottleneck problem, Denis and Baldridge  use ilp for joint inference using a pairwise coreference model and a model for determining the anaphoricity of mentions. Also, Denis and Baldridge  and Bengston and Roth  use models and features, respectively, that attend to particular types of mentions (e.g., full noun phrases versus pronouns). Furthermore, Bengston and Roth  use a wider range of features than are normally considered, and in particular use predicted features for later classifiers, to considerably boost performance.
In  , we use ilp to extend the joint formulation of Denis and Baldridge  using named entity classification and combine it with the transitivity constraints  ,  . Intuitively, we only should identify antecedents for the mentions which are likely to have one  , and we should only make a set of mentions coreferent if they are all instances of the same entity type (eg, person or location ). ilp enables such constraints to be declared between the outputs of independent classifiers to ensure coherent assignments are made. It also leads to global inference via both constraints on named entity types and transitivity constraints since both relate multiple pairwise decisions.
We show that this strategy leads to improvements across the three main metrics proposed for coreference: the muc metric  , the b3 metric  , and ceaf metric  . In addition, we contextualize the performance of our system with respect to cascades of multiple models and oracle systems that assume perfect information (e.g. about entity types). We furthermore demonstrate the inadequacy of using only the muc metric and argue that results should always be given for all three.