Languages change and evolve in many respects but one of the aspects that is most obvious to casual observation is how languages change their word-stock Trask We do not need to wait for several generations to go by to realize that new words are being incorporated into a language, whether they are from foreign languages loanwords or whether they are newly coined in the language.

Loanwords have trespassed the boundaries of linguistics into the realm of the public opinion, politics and policy. This process resulted in a total of almost billion tokens. A token is an individual unit in the natural language analysis of texts. Words are tokens, as are acronyms, proper names, or temporal esaola [ e. This project intends to address four main questions about loanwords in Spanish, namely:.

The structure of this article is the following: In section 2, we briefly discuss the history of Spanish as it was formed and became in contact with other languages. In section 3, we provide a background of what loanwords are and of their importance in different fields of study. In section 4, we describe our study in detail. In section 5, we provide a discussion of our results. Spanish is a member of the Italic branch of the Indo-European family of languages and is currently spoken by over million speakers as a mother tongue and over 90 million as a second language L2 in several countries in Europe, America lengja Africa Lewis, Gary and Fennig With the expansion of the Roman Republic and Empire, between BC and AC, Vulgar Latin evolved into a continuum of overlapping varieties, some of which were mutually unintelligible, thus giving rise to many European languages, one of which was Spanish Penny Latinization of the Iberian Peninsula began in BC and lasted roughly two centuries.

The geographical distribution of lentua several languages that existed in the Iberian Peninsula in pre-Roman times was complex but many of the inhabitants of the Peninsula are believed to have been competent bilinguals in Latin and in their own pre-existent languages Adams— ; Penny The pre-Roman languages are believed to have left a substrate influence, albeit minimal, on the subsequent Latin used in the Peninsula.

Whereas the later Visigoth conquest of Spain AD had little effect on the Latin spoken in the Peninsula, the Islamic Conquest of the Peninsula in AD whose outcome was a continuous presence of their cultural and linguistic influence that lasted for about 5 centuries had an enormous impact: In this study, we count the pre-Roman, Latin, Arabic, and Visigoth influences lwpesa the ones that shaped the development of Spanish before the establishment of the Crown of Castile ina political process through the union of the previously independent kingdoms of Castile and Leon.

We will count all lexical items that derived directly or indirectly from these languages as baseline Spanish or core Spanish. During its birth and growth, Spanish was not the only language spoken in the Iberian Peninsula. Other Latin-based dialects Astur-Leonese, Catalan, Galician-Portuguese and Navarro-Aragonese were also becoming independent, full-fledged languages during the 10 th and 11 th centuries Lapesa These languages together with Basque, a language isolate spoken in the northern part of the Peninsula, occupied and sometimes shared the geographical space of the Peninsula.

In the 15 th and 16 th centuries, Spanish expanded to many sites overseas such as the Canary Islands, the American, and the Lpaesa as a result of the work of settlers, soldiers, and missionaries. In linguistic terms, the colonial expeditions of the Spanish to the Americas in made Spanish more widely spoken and gave rise to more contact scenarios, in this case with various Amerindian languages.

Direct and prolonged contact with all this espaooa of languages produced several additions to the Spanish vocabulary Dworkin ; Penny We have briefly mentioned that Spanish has been very prolific in terms of borrowing all sorts of linguistic components from the languages it has had contact with, and this language is by no means an exception.

Most world languages have borrowed from the languages they have maintained contact with, even when bilingualism in the speakers of the donor and recipient languages has been infrequent Durkin; Kaufman and Thomason47 ; Sayahi This study focuses on the borrowing of lexicon and its associated meaning, which results in loanwords Durkin There is a plethora of ee reasons as to why languages may borrow lexicon from other languages: Since borrowing is a highly lq process, it has patterns: In turn, espaol are borrowed more than any other part of speech such as adjectives or adverbs Matras It is thought that both referential transparency and morphosyntactic freedom espaoa factors that ease borrowing of nouns Matras As we have described before, languages borrow all sorts of linguistic components from other languages.


A possible way in which languages borrow is proposed by Backus in Zenner and Christiansen and Croft Loanwords and all types of innovations very often die soon after they are born and never become part of language A.

The more this innovation or foreign incorporation is pibro and encountered, the more entrenched it becomes. If an innovation becomes entrenched enough, it may end up being conventionalized as part of language A. If the loanword from language B conveys a meaning that did not exist in language A, loanwords are typically accepted without much resistance.

It is possible, however, that the loanword from language B overlaps in a position that was already occupied in language A.

When this is the case, both words are in competition and more than just one outcome is possible. In the first place, one of the two may become obsolete, as it is the case of the Old English word firenwhich was replaced by the French word crime Ringe and Taylor It is also possible that both words stay on language A with highly similar meanings such as English kind and French loanword type. The study of loanwords provides valuable data for studying language change Backus in Zenner and Christiansen Layers of loanwords in a language such as the bulk of loanwords from Old Norse and French in English tell us about the past contacts between speakers of the donor and recipient languages, and the kind of words that were borrowed inform us about the nature of the contact Bynon Our present study focuses specifically on written text.

The reasons why we are focusing on written language are several. First of all, written language has been shown to display a higher ratio of lexical items to total running words than spoken language, which is known as lexical density Halliday A high lexical density is desirable in that it gives us access to a greater number of lexical items than if we used spoken language.

Secondly, written language does not display as much inherent variability of forms as spoken language does Poplack and Dion and it leaves out some aspects that are difficult to analyze i.

In addition, studies that analyze speech samples can only look back to some 80 years of history whereas the study of written samples allows for a much higher retrospective outlook. Equally importantly, writers tend to use the standard language unless it is to create a particular effect.

Therefore, it is reasonable to assume that when words appear in a book, they have been present in the speech for a relatively long amount of time; i. The last reason, but not less important, is practicality. Written texts are already transcribed and in this case, digitizedwhich makes them easier to be processed computationally. There are some corpora for spoken Spanish: CREA, in fact, has both an oral and a written Spanish corpus, the latter being much larger.

But these corpora have something in common, in addition to being very well made and well selected: Because of their size, manual efforts of transcription and pengua have made sense. However, this study seeks to face a new challenge: Descarhar from other sociolinguistic and language contact studies, our investigation does not focus necessarily on the language production of a bilingual territory. In most cases, authors were monolingual and lived in stable monolingual zones.

Therefore, the fact the we see words borrowed from other languages is not necessarily the result of personal or societal bilingualism but of established borrowing processes. In order to collect a dataset as exhaustive and extensive as possible that would allow us to address our aforementioned questions, we resorted to two corpora made publicly available, the HathiTrust Digital Library henceforth: HT with its Librl corpus oftexts Capitanu et al.

Each text in HT is represented as a compressed JSON file that includes metadata about the histria in which it was originally published, such as the year of publication or the author if known, and a list of pages. Each of these pages contains a frequency map for each of the individual tokens found in an automated part-of-speech POS tagging task of the text in the page. Order and structure are lost, and the original text cannot be reconstructed.

Although this might represent a problem to other studies, it is not problematic to ours because word counts were the only requirement to address our questions. Despite containing several times more tokens ,, 1-grams, and 46, unique lemmasthe NGram dataset was smaller: Processing NGram was easier when compared to the HT format since the NGram dataset only contains isolated information about words some POS annotatedcounts, and years and volume of apparition.

An additional problem that the HT database posed was that it only allowed us to use texts up to the copyright start date, that is, volumes that were published in the US before or outside the US before Hence, and as we will see in the results sections, our only source for words in the last century comes from the NGram dataset.

Once the first word counts were calculated, we lemmatized them and ignored proper nouns and other constructions such as initials, numbers, or words with numbers.

It is worth noting that the authors of the NGram dataset reported only 83,, words in Spanish.

Identifying which lemmas were loanwords was possible thanks to the etymological information contained in the electronic edition of the DECH. We built a grammar to recognise the tree structure of the etymology statements. However, given the complexity of the language used to express these etymologies, in some cases we had to automatically traverse the resulting tree after parsing the grammar to fix some mistakes.

We classified the language tags that the DECH uses to define etymological origin in 10 categories in order to capture, in a group, a set of languages that would share a geographic and temporal relation with regards to the Spanish language.

The groups, with some of their more representative languages in terms of their contribution to the Spanish lexicon, are shown below. Note that we are not including all the source languages that the DECH specifies, but only those that served as donor languages directly to Spanish.

While, as mentioned earlier, Latin and Arabic are part of what we have considered the baseline in this study, Spanish has borrowed words from both of these languages more recently. Examples such as these two, by all means, should be counted as loanwords, and not as core Spanish.

To be conservative, we counted these items as belonging to the baseline. Therefore, we acknowledge that the number of loanwords from Latin is higher in reality but, unfortunately, this is a limitation of this otherwise highly specific and thorough dictionary.

English was lpesa as its own group because of its unique relationship to Spanish. This language has influenced Spanish at different stages from different parts of the world. If English had been grouped together with other languages, the difference in loanword borrowing from British English and American English to Spanish might have been masked. Out of the 65, lemmas and definitions included in the DECH dictionary, 33, counted with etymology information we were able to extract.

And although 19, were found to be loanwords, only 6, appeared in our corpora and came from languages other than Latin and Arabic, which again, were counted as baseline Spanish.

When looking at the donor languages for these lemmas, espaolw in Figure 1we found that Greek and French were the most prolific languages in donating lexical items to Spanish: The majority of the most prolific donor languages in Figure 1 are not surprising. After all, Spanish in all Spanish-speaking nations shares a geographic relation or a historical or political link with most of these languages.

One of the notable exceptions is Greek, which appears as the top descargr to Spanish. The great number of Greek loanwords may be unexpected at first.

Spain and Greece have not established direct relations through colonial expansion, wars, or trading in the past centuries other than the one described in Footnote 2. Upon careful observation of the Greek loanwords, we realize that the borrowing process for this language is unlike the one for the other languages.

Besides the great amount of words that originated in Greek and made their way through Spanish via the Vulgar Latin and the Arabic spoken in the Iberian Peninsula before the 20th century which, again, have not been counted as Greek loanwords in this studywe find a lenguua amount of Ancient Greek loanwords in Spanish that were borrowed, unmediated, after the 18th century Fernandez Galiano In the 19 th and 20 desacrgar century, Ancient Histoeia, as the first internationally prestigious language in history Bergua Caverowas often resorted to due to the need to keep up with the rapidly evolving fields of science and technology.