Section: Software
Sanskrit Site
Participant : Gérard Huet [ correspondent ] .
Gérard Huet's Sanskrit Site (http://sanskrit.inria.fr ) provides a unique range of interactive resources concerning Sanskrit philology [40] , [39] . These resources are built upon, among other ingredients, the Zen Toolkit (see section 5.1 ). The site registers thousands of visitors daily.
-
The declension engine gives the declension tables for Sanskrit substantives.
-
The conjugation engine conjugates verbs for the various tenses and modes.
-
The lemmatizer tags inflected words.
-
A dictionary lists inflected forms of Sanskrit words. Full lists of inflected forms, in XML format (given with a specific DTD), are released as free linguistic resources available for research purposes. This database, developed in collaboration with Pr. Peter Scharf, from the Classics Department at Brown University, has been used for research experiments by the team of Pr. Stuart Shieber, at Harvard University.
-
The Sanskrit Reader segments simple sentences, where the (optional) finite verb form occurs in final position. This reader enhances the hand-tagged Sanskrit reader developed by Peter Scharf, that allows students to read simple texts differently: firstly in davanagari writing, then word-to-word, then in a word-to-word translation, then in a sentence-to-sentence translation.
-
The Sanskrit Parser eliminates many irrelevant pseudo-solutions (segmentations) listed by the Sanskrit reader.
-
The Sanskrit Semantic Analyzer , based on the notion of kāraka of Pāṇini, controls overgeneration using a pertinence principle [41] .
-
The Sanskrit Tagger is an assistant for the tagging of a Sanskrit corpus. Given a sentence, the user chooses among different possible interpretations listed by the morpho-syntactic tools and may save the corresponding unambiguously tagged sentence on disk as an hypertext document indexing in the Sanskrit Heritage Dictionary (our structured lexical database). This service has no equivalent worldwide.
-
The morphological data for Sanskrit have been released by Gérard Huet under LGPLLR (http://sanskrit.inria.fr/DATA/XML/ ). The precise lexer used by the shallow parser is specified as a modular transducer whose top-level states are the lexical categories corresponding to the flexed forms banks, and whose arcs correspond to (the inversion of) euphony (sandhi ) rules.
An on-going project is the construction of a tree bank of Sanskrit examples, in collaboration with Pr. Brendan Gillon, from McGill University in Montreal.