The resources identified belong to three categories: monolingual corpora, multilingual comparable corpora and multilingual parallel corpora. Before proceeding to describe the corpora that currently exist in Spain it is illuminating to explore who is currently developing them: this means identifying the most important sources of corpora so that the reader can periodiodically check the new advances in the field.
It contains million words, both fiction verse and prose and non-fiction texts scientific, social, press and advertising, religious, historical and legal documents. It is a vast bibliographic and documentary repository that can be freely accessed via the Internet. The Linguistic Tools section provides a collection of tools specifically designed for analysing and exploiting digital texts.
It has an advanced text search engine that allows words to be searched for within texts, and a concordance tool, which makes it possible to search for words in context. The result is a wide-ranging sample of the Spanish language from the Poema del Cid to selections from the contemporary press, from pieces of Latin American literature to a comment heard on the bus, selected by our greatest grammarian.
It can be accessed by grammatical categories, author, work, word or expression. While complex to begin with, it is extremely interesting once its mechanism has been mastered. The objective of these projects is to use corpora for extracting terminology and creating ontologies. The second Publisher cited, VOX-Bibliograf has compiled its own corpus for the development of dictionaries.
It currently participates in the Eurowordnet Project. The EuroWordNet project aims to develop a multilingual database with basic semantic relationships between words for several European languages Dutch, Italian and Spanish. We have tried to access their corpus but with no success. The publishing house Diccionarios SM has also compiled a corpus of literary, journalistic, scientific and technical texts for the development of lexicographic work of 60, words. Corpora Compiled by Linguistic Engineering and Computational Linguistics Groups Another important source of corpora are the departments of linguistic engineering and computational linguistics devoted to developing computer systems capable of recognising, understanding, interpreting and generating human language in all its forms.
Research in this field includes the development of linguistic resources morphological grammars, formal and computational grammars, electronic lexicons with information in conventional formats such as EAGLES , Computer Assisted Translation and Automatic Translation programs, the development of person-machine interfaces and tools for analysing and using corpora. The immense majority, if not all the systems of language processing, operate with monolingual or bilingual corpora of texts to which various linguistic processors are applied at the phonological, phonetic, textual, morphological, lexical, syntactical, logical, semantic and pragmatic level.
In these contexts, the object of creating very extensive corpora is to provide adequate bases for creating Example Based Machine Translation Systems EBMT or Machine Generated Text, both monolingual and multilingual automatic production of multilingual technical documentation. Projects of this kind have generated numerous multilingual corpora and are expected to generate many more in the near future.
Recommend to librarian
Researchers believe that this could mean moving from CAT to AT in fields where significant bilingual corpora exist. In many cases these are multilingual projects that combine a specific economic activity for example tourism with the development of linguistic technologies AT; text generation, identification of types of texts or information on Internet.
The Council has recently approved the e-Contentplus programme. EUROMAP "Facilitating the path to market for language and speech technologies in Europe" - aims to provide awareness, bridge-building and market- enabling services for accelerating the rate of technology transfer and market take-up of the results of European HLT RTD projects.
Parallel sentence generation from comparable corpora for improved SMT | SpringerLink
ELSNET "The European Network of Excellence in Human Language Technologies" - aims to bring together the key players in language and speech technology, both in industry and in academia, and to encourage interdisciplinary co- operation through a variety of events and services. It has established conventions for the encoding of corpora and harmonised specifications for computational lexicons, building on and contributing to the preliminary recommendations of the relevant international and European standardisation initiatives. Clic has also compiled a reference corpus of 1,, words with morphological and syntactical annotation, manually validated, which is the result of the fusion of two important lexical resources: , words from the corpus Lexesp and , words from the electronic corpus of the newspaper La Vanguardia.
Besides these corpora, Clic has produced bilingual lexicons connected to the lexico-semantic web EuroWordNet: bilingual lexicon English-Spanish more than Universidad de Deusto. Main Researcher: Joseba Abaitua. The research group is developing tools for the automatic drafting and translation of administrative documents based on this bilingual corpus.
The number of aligned works and language pairs available at this website increases regularly, and with great vitality since the CLUVI is an academic research project in progress.
- Heat exchanger engineering techniques : process, air conditioning, and electronic systems : a treatise on heat exchanger installations that did not meet performance!
- Professor Gunilla M. Anderman books and biography | Waterstones.
- Translating and interpreting: Circulation Books.
- Incorporating corpora: the linguist and the translator - Gunilla M. Anderman - Google книги;
- Terminology Coordination Unit.
The texts are tagged with part-of-speech and morphological annotation. They are currently compiling a multilingual corpus Catalan, Castilian, English, French and German of texts belonging to the areas of economy, law, environment, medicine and information technology. Since September , staff and students at ISK have been designing and implementing Internet-based grammar tools for education and research using corpora. The Europarl- es corpus contains 29 millions words of parliamentary debates and can be searched without password.
The growing interest recently observed in the development of multilingual and bilingual corpora at the Faculties of Translation and Linguistics can be observed in the shift in orientation of research projects and doctoral theses. Many CTS projects Corpora Translation Studies are under way, many doctoral theses have incorporated corpora to perform various kinds of experiments and many teachers are coordinating the wealth of texts of all types that they can gather with the help of their students.
The corpus contains million words of text: 20 million from the ss, 40 million from the ss, 40 million from the ss. As Davies points out in his website, in addition to being very fast, the search engine allows a wider range of searches than almost any other large corpus in existence. The database can be queried very quickly -- usually just a few seconds for even the most complicated queries.
We do recommend the reader to have a look at this powerful resource. The interdisciplinary Research Group Oncoterm terminologists, translators and physicians from the University of Granada has developed an information system for the medical subdomain of oncology. This information system is intended for health practicioners, relatives and friends of patients with cancer, translators and journalists. A vast corpus of documents on cancer in English and Spanish has been compiled together with a terminological database.
A multilingual corpus of tourism contracts German, Spanish, English, Italian for automatic text generation and legal translation is the objective of a research group based at the University of Malaga, with its main researcher Gloria Corpas. According to the information given at their web, a multilingual corpus both parallel and comparable will be compiled from tourism and law websites in the Internet.
A protocol will be laid out for searching the WWW, and retrieving, encoding and storing hiper texts. Borja Moreover, the information it offers for each legal genre helps the translator respect the conventions of genre, so important in this field of knowledge for example, when translating a will, the translator should be familiar with the formal aspects of this genre in the target language. In fact, the conservatism of law is reflected in the repetitive and fossilised character of its textual structures, its phraseology and its specialised lexicon, in such a way that the legal text constitutes a paradigm of stereotyped text susceptible to generic description.
It would therefore seem advisable to have systems of classifying documents for each area of specialisation and, particularly in the case of legal translation, it would be useful to have a taxonomy of texts in the source and the target language that would enable translators to compare terminology, usage and the practical application of the law. The translator should always try to fit the text he or she is going to translate into a conventional textual category that speakers of a particular language will recognise. Legal texts constitute instruments that have a certain form and function in each culture and sometimes there is no equivalence between languages due to the lack of uniformity between different legal systems.
Is it a will or a fragment of a text book on law? Is it a section of an Act, a writ of summons or a judgment? One of the most direct applications of the corpus developed is its potential for teaching purposes. In fact, from the early days of the project, it has been used in legal translation classes at Universitat Jaume I, where the researchers of the group are engaged as teachers. The teaching experience has shown us that our description of the legal genres is a didactic tool of undoubted value.
The corpus is complemented by a system developed by Steve Jennings for managing relational data bases, which enables multiple searches to be made in an extremely functional and efficient way. The user interface is fast and easy to use, and allows the documents to be recovered in text format. The documents can be downloaded at full text for research purposes after applying for a password through the same web address.
Moreover, this classification is complemented by a system of crossed searches that combines amongst other data the original language, the status of the text original or translation , date of creation and the source. Conclusions The empiricist trend is rapidly gaining ground in Translation Studies as new methods and research resources are available.
Corpora which until now have mainly been used for analysing aspects of cross-cultural linguistics are unquestionably useful for Translation Studies. Translation researchers can obtain very useful information from raw untagged monolingual corpora, but multilingual corpora containing text files in several languages segmented, aligned, parsed and classified allowing storage and retrieval of aligned multilingual texts against various search conditions have proved to be an invaluable tool.
Translation and interpreting
On the other hand, the process of professionalisation and specialisation that translation has undergone since the midth century has resulted in something akin to an identity crisis in the translator that has become more acute in recent decades with the rapid development of information technologies. The new technologies have facilitated the appearance of expert systems for organising knowledge that render obsolete the more traditional modus operandi of the scientific and professional community. The high degree of specialisation required by certain types of translation today such as legal or medical translation makes it necessary to find new systems for obtaining and recovering knowledge and data which even the best and most encyclopaedic human mind cannot match.
Today a mixed system of knowledge management is needed that enables translators to integrate their skills with electronic information management and recovery systems. Corpora resources are the basis of any such system and will very soon replace the traditional dictionaries and encyclopaedias. Nevertheless, since there is no single translator profile or a single profile of Translation Studies approach it is impossible to talk about a single ideal corpus design for translators, but rather of specific designs for specific translation or research purposes.
- Construction Creativity Casebook?
- Medali SSSR = Medals of the USSR!
- Formal Methods for Discrete-Time Dynamical Systems.
- Involved Fathering and Mens Adult Development: Provisional Balances.
Translators can exploit and apply corpora in many ways performing terminological or statistical searches, collocations, looking for functional equivalences, observing the texture of genres, investigating phraseology, etc. The interests are very varied showing that a number of different working possibilities are opening up for translation researchers and practitioners. The corpora resources available in Spanish identified in this contribution are only a small sample of what we can expect to find in the near future.
Unfortunately for translators, most of these resources are only available through the Internet browsing tools which permit terminological and collocation searches but do not, generally, allow the reading of full texts due to copyright restrictions. Corpora based on genre provide the end user with full texts showing genre conventions and structure.
References Abaitua , J. Llisterri eds Tratamiento del lenguaje natural pp.
Corpus design criteria. Literary and Linguistic Computing Baker, M. International Journal of Corpus Linguistics Ball, C. On-line Tutorial. Cambridge: Cambridge University Press. Borja Albi, A. Barcelona: Ariel. Granada: Atrio, Granada. Bern, Peter Lang. In Enrique Alcaraz ed.
Barcelona, Ariel. Botley, S, McEnery, A. Amsterdam: Rodopi. Bowker, L. Universidad Ricardo Palma, Lima, Peru. Hanks, P. Contextual Dependency and Lexical Sets.