Multiword expressions are lexical units that consist of two or more words (tokens), however, they exhibit special syntactic, semantic, pragmatic or statistical features. From an NLP point of view, their treatment is not free of problems since - on the one hand - the system should recognize that they count as one lexical unit (and not two or more words connected) therefore it is advisable to store them as one unit in the lexicon. On the other hand, special rules for their treatment should also be included in the system.
Related publications
- Nagy T., István; Vincze, Veronika 2014: VPCTagger: Detecting Verb-Particle Constructions With Syntax-Based Methods. In: Proceedings of the 10th Workshop on Multiword Expressions (MWE), ACL, Gothenburg, Sweden, pp. 17-25.
- Vincze, Veronika; Nagy T., István; Farkas, Richárd 2013: Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach. In: Proceedings of ACL 2013 (Volume 2: Short Papers), pp. 255-261.
- Vincze, Veronika; Nagy T., István; Zsibrita, János 2013: Learning to Detect English and Hungarian Light Verb Constructions. ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use. Part 1, 10(2), Article 6.
- Vincze, Veronika; Zsibrita, János; Nagy T., István 2013: Dependency Parsing for Identifying Hungarian Light Verb Constructions. In: Proceedings of IJCNLP 2013, pp. 207-215.
- Vincze, Veronika 2011: Semi-Compositional Noun + Verb Constructions: Theoretical Questions and Computational Linguistic Analyses. PhD thesis, University of Szeged, August 2011.
- Vincze, Veronika; Nagy T., István; Berend, Gábor 2011: Detecting noun compounds and light verb constructions: a contrastive study. In: ACL Workshop on Multiword Expressions: from Parsing and Generation to the Real World. Portland, Oregon, USA, pp. 116-121.
- Nagy T., István; Berend, Gábor; Vincze, Veronika 2011: Noun Compound and Named Entity Recognition and their Usability in Keyphrase Extraction. In: Proceedings of RANLP 2011. Hissar, Bulgaria, pp. 162-169.
- Nagy T., István; Vincze, Veronika; Berend, Gábor 2011: Domain-dependent identification of multiword expressions. In: Proceedings of RANLP 2011. Hissar, Bulgaria, pp. 622-627.
- Vincze, Veronika 2009: On the Machine Translatability of Semi-Compositional Constructions. In: Váradi Tamás (ed.): Válogatás az I. Alkalmazott Nyelvészeti Doktorandusz Konferencia előadásaiból - Selected Papers from the 1st Applied Linguistics PhD Conference, Budapest, MTA Nyelvtudományi Intézet, pp. 166-178.
In information extraction and retrieval it is of high importance to distinguish uncertain and/or negated propositions from factual information. In most cases, what the user needs is factual information, thus, uncertain or negated propositions should be treated in a special way. Depending on the exact task, the system should either neglect such texts or separate them from factual information (later, the user can decide whether s/he needs them).
Related publications
- Vincze, Veronika 2014: Uncertainty Detection in Hungarian Texts. In: Proceedings of COLING 2014, Dublin, pp. 1844-1853.
- Vincze, Veronika; Simkó, Katalin Ilona; Varga, Viktor 2014: Annotating Uncertainty in Hungarian Webtext. In: Proceedings of LAW VIII, Dublin, pp. 64-69.
- Vincze, Veronika 2013: Weasels, Hedges and Peacocks: Discourse-level Uncertainty in Wikipedia Articles. In: Proceedings of IJCNLP 2013, pp. 383-391.
- Szarvas, György; Vincze, Veronika; Farkas, Richárd; Móra, György; Gurevych, Iryna 2012: Cross-Genre and Cross-Domain Detection of Semantic Uncertainty. Computational Linguistics - Special Issue on Modality and Negation, 38(2):335-367.
- Farkas, Richárd; Vincze, Veronika; Móra, György; Csirik, János; Szarvas, György 2010: The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010): Shared Task, Uppsala, Sweden, pp. 1-12.
- Vincze, Veronika 2010: Speculation and negation annotation in natural language texts: what the case of BioScope might (not) reveal. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP 2010), Uppsala, Sweden, pp. 28-31.
- Vincze, Veronika; Szarvas, György; Móra, György; Ohta, Tomoko; Farkas, Richárd 2011: Linguistic scope-based and biological event-based speculation and negation annotations in the BioScope and Genia Event corpora. Journal of Biomedical Semantics 2(Suppl 5):S8 doi:10.1186/2041-1480-2-S5-S8.
- Vincze, Veronika; Szarvas, György; Farkas, Richárd; Móra, György; Csirik, János 2008: The BioScope Corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics 9 (Suppl 11):S9 doi:10.1186/1471-2105-9-S11-S9
For higher-level language technology research and development in Hungarian it is essential to have a basic language resource kit that is used for segmenting, morphological and syntactic parsing and POS tagging of texts. In order to unify the available tools, we harmonized the MSD and KR coding systems, and integrated the morphological parser based on this new coding system into our toolchain called magyarlanc, which is also extended with a dependency parser. |
Related publications
- Zsibrita, János; Vincze, Veronika; Farkas, Richárd 2013: magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of RANLP 2013, pp. 763-771.
- Farkas, Richárd; Vincze, Veronika; Schmid, Helmut 2012: Dependency Parsing of Hungarian: Baseline Results and Challenges. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 55-65.
- Farkas Richárd, Szeredi Dániel, Varga Dániel, Vincze Veronika 2010: MSD-KR harmonizáció a Szeged Treebank 2.5-ben. In: Tanács Attila, Vincze Veronika (eds.): VII. Magyar Számítógépes Nyelvészeti Konferencia. Szeged, Szegedi Tudományegyetem, pp. 349-353.
In order to develop algorithms for NLP problems, there is immerse need for domain- or task- specific annotated corpora (databases). Thus, building corpora is an essential part of creating NLP applications.
Some corpora the construction of which I participated in:
Related publications
- Vincze, Veronika; Csirik, János 2010: Hungarian Corpus of Light Verb Constructions. In: Proceedings of COLING 2010, Beijing, China, pp. 1110-1118.
- Vincze, Veronika; Szauter, Dóra; Almási, Attila; Móra, György; Alexin, Zoltán; Csirik, János 2010: Hungarian Dependency Treebank. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10), Valletta, Malta.
- Vincze, Veronika 2012: Light Verb Constructions in the SzegedParalellFX English-Hungarian Parallel Corpus. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC 2012). Istanbul, Turkey, pp. 2381-2388.
- Vincze, Veronika; Szarvas, György; Almási, Attila; Szauter, Dóra; Ormándi, Róbert; Farkas, Richárd; Hatvani, Csaba; Csirik, János 2008: Hungarian Word-sense Disambiguated Corpus. In: Proceedings of 6th International Conference on Language Resources and Evaluation LREC 2008, Marrakech, Morocco.
- Vincze, Veronika; Nagy T., István; Berend, Gábor 2011: Multiword expressions and Named Entities in the Wiki50 Corpus. In: Proceedings of RANLP 2011. Hissar, Bulgaria, pp. 289-295.
- Vincze, Veronika; Zsibrita, János; Durst, Péter; Szabó, Martina Katalin 2014: Automatic Error Detection concerning the Definite and Indefinite Conjugation in the HunLearner Corpus. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), ELRA, Reykjavik, Iceland, pp. 3958-3962.
- Szabó, Martina Katalin; Vincze, Veronika; Nagy T., István 2012: HunOr: A Hungarian-Russian Parallel Corpus. In: Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC 2012). Istanbul, Turkey, pp. 2453-2458.
Ontologies are typically large hierarchical datasets in wich words and their relations are stored. Ontologies may efficiently contribute to the performance of several NLP applications, for instance, in information extraction and retrieval hypernymy and hyponymy relations can be usefully exploited.
Some ontologies the building of which I took part in:
- Hungarian WordNet
- Hungarian financial domain ontology
- Hungarian customs law wordnet (TaXWN)
Related publications
- Vincze, Veronika; Almási, Attila 2014: Non-Lexicalized Concepts in Wordnets: A Case Study of English and Hungarian. In: Proceedings of the 7th International Global WordNet Conference, pp. 118-126.
- Vincze, Veronika; Almási, Attila; Csirik, János 2012: Multiword Verbs in WordNets. In: Proceedings of the 6th International Global WordNet Conference, pp. 377-381.
- Alexin, Zoltán; Csirik, János; Almási, Attila; Vincze, Veronika 2010: Domain Specific Wordnet on Customs Law. In: Proceedings of the Fifth Global WordNet Conference, GWC2010, January 31-February 4 2010, Mumbai, India, pp. 234-239.
- Vincze, Veronika; Almási, Attila; Szauter, Dóra 2008: Comparing WordNet Relations to Lexical Functions. In: Tanács, Attila; Csendes, Dóra; Vincze, Veronika; Fellbaum, Christiane; Vossen, Piek (eds.): Proceedings of the Fourth Global WordNet Conference. GWC 2008. Szeged, University of Szeged, Department of Informatics, pp. 462-473.
- Vincze, Veronika; Szarvas, György; Csirik, János 2008: Why are wordnets important? In: Cepisca, Costin; Kouzaev, Guennadi A.; Mastorakis, Nikos M. (eds.): New Aspects on Computing Research. Proceedings of the 2nd European Computing Conference (ECC'08), WSEAS Press, pp. 316-322.