Section: New Results
Towards a FrameNet for French
Participant : Guillaume Pitel.
Following the collaboration with the Berkeley's FrameNet team, we have achieved some important tasks toward the definition of a large scale effort for building a FrameNet for French. Thus, like Claire Gardent's work on the LADL tables, the work reported here is another component of LED's effort to create lexical semantic resources for French.
An analysis of the semantic cohesion of Frame Elements contents has been conducted ("semantic" here is based on co-occurrence information, as in Latent Semantic Analysis). Results can be found at http://guillaume.pitel.free.fr/Frames.en.3/index.html , showing how an important part of the Frame Element annotated contents are actually in limited semantic areas, allowing theoretically for a bilingual projection of Frame Elements (that is, determining the Frame Element of a sentence segment in a language lacking manual FrameNet annotation).
An annotation in French of 1076 sentences extracted from the Europarl corpus (corresponding to a subcorpus already annotated for English and German by Sebastian Pado and Katrin Erk)
An evaluation of the expected necessary time for production of an assisted manually-produced list of lexical units for French, using translations from the Semantic Atlas Project (Sabine Ploux and Hyung-suk Ji) and translations from the online WordReference tool.
A set of monolingual and bilingual LSA spaces built using the Infomap tool, using corpora from the Europarl set as well as the BNC, Frantext, and several texts from the Gutenberg project.
A clustering tool that can be integrated in the Infomap system, used to build more precise subsets of closely related terms in the Frames and Frame Elements annotation contents.
A set of tools for automatic Frame and Frame Element attribution based on those tools and the data they produce. Results seem to be good enough for the system to be used as an annotation assistant. Level 1 (first proposal) results for correct attribution are 54.5% for Frames and 63% for Frame Elements. Level 4 (one of the first 4 proposals) are 72% and 92%. Baseline for the Frame annotation is close to zero in our resource-free model since there are 500+ frames that can be attributed, baseline for the Frame Element is much higher : 41%, since a Frame has 7.65 Frame Elements on average.
A chapter in a FrameNet anniversary book edited by Hans Boas is currently being written.