This post is part of a draft on South Siberian language homelands and Sprachbünde.
The following text contains a description of Pre- and Proto-Samoyedic stages and its dialectal diversification. Contacts with Indo-Iranian, Yeniseian, Tocharian, Yukaghir, and Turkic, as well as onomastics and palaeolinguistics are taken into account to pinpoint the succeeding homelands and expansion territories. The archaeological-archaeogenetic discussion is focused on the Middle Bronze Age Cherkaskul materials of the Andronovo period, on the Late Bronze Age Karasuk culture, and on the evolution and expansion of the Iron Age Tagar culture within the framework of “Scytho-Siberian” groups.
- Pre- & Proto-Samoyed
- Areal linguistics
- External contacts
- Proto-Indo-Iranian & Iranian
- Yeniseian & Tocharian
- Yukaghir
- Turkic, Mongolic & Tungusic
- Palaeo-Siberian & Palaeo-Arctic
- Hydrotoponymy
- Palaeolinguistics
- Archaeology & Population Genomics
PLEASE NOTE. Many of the Y-SNP calls from ancient samples referred to below have been analyzed by the FamilyTreeDNA Haplotree team formed by phylogeneticist Michael Sager and Göran Runfeldt from the R&D team. Those ancient samples with validated haplogroup inferences are marked by a hyperlink to the FTDNA Haplotree. Occasionally, though, such hyperlinks are also used in the text when discussing Y-SNP branches in general, without referring to specific ancient samples. For a quick reference of ancient samples, you can check out the Ancient DNA Dataset, also visually in an Online Web Map, in SNP Tracker, or in AncientDNA.info. TMRCA and formation dates have been checked from YFull.
1. Pre- & Proto-Samoyed
Proto-Samoyed shows a limited number of reconstructible lexemes, probably around 1000 words based on the works by Janhunen (1977) & Aikio (2002, 2004), i.e. substantially less than all other known intermediate Uralic dialects. This is related to the lack of sufficient sources of dialectal Samoyedic lexicon, but possibly also to the high age of this proto-language within the family (Saarikivi 2020: 48-49).
Samoyed languages show more variation than Finnic, Samic, or Permic languages, believed to have diverged in the Iron Age or later, which sets a relative terminus ante quem for the diversification of Proto-Samoyed. On the other hand, given the certainty in the reconstruction of a complex system of distinguishable Proto-Samoyed inflectional suffixes – unlike those for Ugric, assumed to have separated ca. 2000 BC, if not earlier – it is likely that Proto-Samoyed reflects a language that diverged early in the first millennium BC (J. Pystynen, p.c. in Piispanen 2018:359).
1.1. Areal linguistics
While Nganasan shows more archaic features, most areal variation is found within southern groups, i.e. Selkup and Sayan Samoyed. The extinct but well-described Mator language (Helimski 1997) in particular is notably different from other attested Samoyed languages, although it is unclear to what extent this is due to its strong Yeniseic substrate (see below). All this suggests that the older variation has disappeared due to language shift of ancient Samoyedic speakers to Turkic and Russian (Saarikivi 2020: 51).
NOTE. Juho Pystynen recently announced that Tamás Janurik has been uploading papers to his Academia.edu account, among them two multilingual ‘documental-comparative’ dictionaries of Kamassian (Kamassz szótár) and Koibal (Kojbál szótár). Further research on extinct Samoyed languages might bring about changes to the PU reconstruction.
The Nenets have an oral tradition concerning Sikhirtja, the population preceding them in their present area, which has been considered an account potentially referring to pre-Uralic groups inhabiting the Arctic coast (Stipa 1990: 66-67).
Reindeer herding likely caused the spread of Tundra Nenets and replacement and assimilation of earlier smaller groups, similar to how Evenki, Yakut, and Northern Saami spread recently to large sparsely populated regions, evidenced also by the relatively recent extensive borrowing of related terminology (Piispanen 2016).
The eastern part of the Nenets area is known to be of later (medieval) spread (Saarikivi 2020: 51).
1.2. External Contacts
1.2.1. Proto-Indo-Iranian & Iranian
In contrast to all other Uralic dialects, few inherited Indo-Iranian items are found in the Samoyed branch – or at least few survived the subsequent Siberian substratal and superstratal influences. In fact, they seem to fall on the earlier stage of PU ~ PIIr. contacts, bearing witness to a physical separation of the Samoyed branch at roughly the same time as the adoption of these words (J. Häkkinen 2009: 20-25). Compare (Holopainen 2019, passim):
- Pre-PSmy. *waksa-/*wakša- ← PIE/Pre-PIIr. *wok(ʲ)s-éje- ‘grow’;
- Pre-PSmy *täši ‘tent roof made of birch bark’ ← PIIr. *ta(ć)šya- ‘to be formed (out of wood etc.)’, root *taćš-, cf. OInd. takṣ ‘carpenter, hew, hammer, harden’.
- PU (Fi., Smy.) *śaδa- ‘to rain’ ← Pre-PIIr. kʲat- or PIIr. *ćad- ‘fall’;
- PU (Saa., Fi., Smy.) *tora- ‘fight’ ← PIE/Pre-PIIr. *dʰor- / PIIr. *dʰār-.
Later Iranian borrowings found exclusively in Samoyed reflect different contact layers (Holopainen 2019, passim), attesting to its isolation from the developments of other Uralic branches, but still close to the Central Asian steppes:
- PSmy. *wǝ̑rkǝ̑ (←? Pre-PSmy. *wurka) ‘bear’ ← P(I)Ir. *wr̥ka- ‘wolf’ (cf. OInd. vŕ̥ka-).
- PSmy. *jäə̑ ‘flour’ (←? Pre-PSmy. *jäwi) ← (P)Ir. *jawa- or Alanic *yæw- ‘grain’.
- PSmy. *täjkå ‘knife; sword; spear; hook;’ ← unattested Old Iranian during the first millennium BC, cf. PIr. *tajga- (cf. Av. taēγa-) ‘sharp; sharpness’.
- PSmy. *pulǝ̑/*pilǝ̑ ‘bridge’ ← MIr. (cf. MPers. puhl ‘bridge’), probably a younger borrowing than the other three.
NOTE. PSmy. *jäə̑ impossible to derive directly from PFU *jewä, it probably reflects an old PSmy. *ä-i-stem (Aikio 2002: 49) despite the Iranian *a-stem (Holopainen 2019: 105). The semantic shift from ‘grain’ to ‘flour’ also suggests that Proto-Samoyeds did not practice agriculture, whereas the adoption of the word supports a continuous cultural contact with Indo-Iranian-speaking steppe agropastoralists.
1.2.2. Yeniseian & Tocharian
Four vowels from the eleven reconstructed for Proto-Samoyedic seem to have arisen secondarily, which leaves Pre-Proto-Samoyed with a seven-vowel system identical to (or eight-vowel system closely following) the Yeniseian and Pre-Proto-Tocharian ones. In fact, Tocharian shows an ancient strong typological convergence to South Siberian languages, which can be best explained by assuming an adaptation of this early splitting Late PIE branch to a Yeniseian-speaking population, or to a strongly Yeniseian-influenced Pre-Proto-Samoyedic-speaking one (see Peyrot 2019; cf. also Ivanov 1985, Kallio 2001, 2002, Bednarczuk 2015).
Main changes from PIE to PToch. include a merger of the three stop series (similar to Uralic); development of a vowel system similar to Uralic, Yeniseian, or Yukaghir; and agglutinative case marking (general South Siberian feature), with dative and allative differentiated (as in Yukaghir and Yeniseian), and genitive for the indirect object of ‘give’ (as in Uralic). Other good matches for substrate influence include object marking of the verb (as in Uralic or Turkic), and the use of converbs, a feature widespread in the area. One reliable loanword with self-evident cultural relevance is PSmy. *wäsa ‘iron; metal; money’ → PToch. *wesa ‘gold’.
The scarce lexical influence despite clear language interference in phonetics, phonology, and syntax also point to substrate influence or interference induced by language shift (cf. Thomason & Kaufman 1988: 129–146), which seems to be the case first for Pre-Proto-Yeniseic → Pre-Proto-Samoyedic, and then for Pre-Proto-Samoyedic → Pre-Proto-Tocharian. On the other hand, there are traces of later lexical influence of Yeniseic on Samoyed dialects; cf. Proto-Khanty *kānəŋ ‘bank (of a river); edge (of a forest, shawl etc.)’ ← Proto-Selkup *k͔anək ‘bank (of a river)’ (Alatalo 2004: 289) ← Yeniseian, cf. Pumpokol kónnon ‘mountain’, related to either Kott hanaŋ ‘shore’ or Ket qaŋńeŋ ‘mountain (wooded’) (Zhivlov 2017); or the striking borrowing in Enets of the 2nd and 3rd person singular from Ket (Georg 2008).
Pre-Classic Old Chinese (ca. 10th-6th c. BC) is the likely earliest layer of (few) Chinese loanwords in Tocharian, with increasing influence shown by pre-Han Chinese loanwords (before 200 BC), and the rest corresponding to the spread of Han and Tang Dynasties into Central Asia (cf. Blažek & Schwarz 2017: 21-74). The Tocharian borrowings in Chinese are considerably older, suggesting a meaningful cultural shift in Tocharian-Chinese relationships close to the Altai-Tian Shan area.
Despite a few likely early (Pre-?)Proto-Indo-Iranian loans, most of the attested Iranian influence on Tocharian seems to be much later, starting with Old Bactrian (see e.g. Gerd Carling’s 2019 post on Iranian & Tocharian), which – together with South Siberian and Chinese influence – locates Pre-Proto-Tocharian as an Indo-European branch long isolated from Indo-Iranian developments, and likely spoken somewhere between the Altai-Sayan region and the Hexi Corridor, probably closest to the Eastern Tian Shan after the demise of the Chemurchek culture.
Tocharian loans on Samoyedic seem to be scarce; cf. PSmy. *sejt³wə ‘seven’ ← PToch. *s’əptə ‘id.’, PSam. *we̮n ‘dog’ ← PToch. obl. *kwenə ‘id.’, PSmy. *menüjə̑ ‘full moon’ PToch. *ḿeńe ‘moon’. There are also Samoyedic-Yeniseian lexical parallels such as for “nursery words” and words referring to spiritual or divine world (Kallio 2004, Peyrot 2019). However, none of these is uncontested. This lack of strong lexical influence on proto-languages compared to the substratal influence on pre-proto-languages contradicts the expected mutual substratal-morphophonosyntactical ↔ adstratal/superstratal-lexical relationships, which suggests that the core areas of Proto-Yeniseian (closer to the Ob’-Middle Yenisei-Angara region) and Proto-Tocharian (between the Altai and the Tian Shan) were not in immediate contact with the core Proto-Samoyed area.
NOTE. The earlier split of Tocharian with Afanasievo from the Late PIE core area of Yamnaya compared to its late, Cherkaskul-related Pre-Proto-Samoyedic substratal influence contrasts thus with the Late Proto-Uralic substratal influence on the chronologically and geographically closer Eastern European Pre-Proto-Indo-Iranian and Pre-Proto-Balto-Slavic. Both strong and mutual Uralic ↔ Indo-European language interferences are different, but similar enough to reinforce each other (read more on the Proto-Uralic Homeland). The Uralic influence on Germanic was – like that of Tocharian – geographically independent, chronologically later, and influenced by local (Scandinavian) languages. Unlike Pre-PSmy. ↔ Pre-PToch. contacts, the long-lasting mutual Palaeo-Germanic ↔ Pre-Balto-Finnic interferences suggest that their core proto-language areas remained in close contact.
1.2.3. Yukaghir
Yukaghir shows some notable lexical similarities with Proto-Uralic (cf. Nikolaeva 2006). Even though many cognates proposed earlier are erroneous, there still are ca. 30 reliable lexical parallels; cf. PU *käliw ‘brother- or sister-in-law’ ~ PYuk. *keľ- ‘brother-in-law’, PU *nimi ~ PYuk. *ńim / *nim ‘name’, PU *wanča(w) ~ PYuk. *wonč- ‘root’, PU *wixi- ‘take, transport’ ~ PYuk. *weɣ- ‘lead, carry’ (Aikio 2020: 52).
Lacking any hard evidence of regular sound correspondences or shared morphology with Proto-Uralic – at least none beyond those attributable to Indo-Uralic, or further to Eurasian languages – most of these parallels are considered today obvious borrowings from (Pre-)PSmy. to Yukaghir (Rédei 1999, Häkkinen 2012, Aikio 2012, 2014, Aikio 2020). Further, ideas of genetic relationship have never been part of mainstream Uralistics (Saarikivi 2020: 50).
NOTE. Disregarding the traditional typological comparisons, which are better explained though language contact, the main argument in favour of a genetic connection is the basic nature of part of the shared vocabulary. For relatively recent opinions favorable to potential Uralic-Yukaghir(-Altaic) connections, see Piispanen (2013), De Smit (2019), and Nikolaeva (2020) (reviewed in Piispanen 2019). On the other hand, some of these basic items are also shared with archaic PIE~PU cognates. For an ultimate connection with Indo-Uralic, see e.g. Hyllested (2009) for PIE-PU + PYuk. or Kortlandt (2010) for PIE + PU-PYuk. (Read a summary of potential Indo-Uralic reconstruction).
Since Yukaghir is a language extant today only ca. 3000 km away from the likely Samoyed homeland, in the Kolyma River Basin and the Russian Far East, their contacts are unlikely to be recent. Even with the known formerly spread up to the Lena River in the west, before the pox epidemy in the 18th century, there seems to be little ground for recent Samoyedic ~ Yukaghir contacts.
What is more, the imbalances of the Yukaghir vowel system and vowel harmony seem to reflect the adaptation of an original system with front rounded *ü and *ö to a system very similar to that seen in Yeniseian, Pre-Proto-Samoyedic and Pre-Proto-Tocharian (Peyrot 2019), which would locate it close to their Sprachbünde around the Upper Yenisei.
Fitting those contacts, the core territory of Proto-Yukaghir hydronymy surviving until recently is found along the Angara & Tunguskas river basins. Many river names display Yukaghir etymologies or show the endings -mba/-mbu/-mbe, particularly absent from bisyllabic and polysyllabic hydronyms in Yeniseic and Tungusic hydronyms. Further, polysyllabic words with such non-root endings occur in Yukaghir and Selkup, but are absent from Samoyed (Nemirovskij 2019).
1.2.4. Palaeo-Siberian & Palaeo-Arctic
The most striking feature of PSmy. is the replacement of inherited PU words by loans of unknown origin, probably stemming from a Taiga Substrate around the Minusinsk Basin. This archaic layer even affected Uralic numerals and basic words; cf. PU *kulma vs. PSmy. *nakur ‘three’, PU *neljä vs. PSmy. *tättə ‘four’ (although see Turkic below), PU *käte vs. PSmy. *utå ‘hand’ (Saarikivi 2020: 50).
Northern Samoyed also adopted even more substrate features when spreading to the Tundra. For example, Nenets words such as lʲidʲaŋk ‘beaver’, lymbød ‘swamp’, nʲenʲaŋk ‘mosquito’, xasrʲo ‘lake growing moss’, tʲ’iwtʲei ‘walrus’ all belong to those semantic groups atypical for substrate and also show phonematic features that are not reconstructible in Proto-Samoyed, such as word initial l, clusters srʲ- or wt’-, etc. (Saarikivi 2020: 50).
NOTE. In that sense, Samoyed is similar to Ob-Ugric and Saami, showing a two-staged pattern of substratal relexification: first from contacts with Taiga populations, and later from contacts with Arctic peoples, supporting a south-to-north cline of Uralic language replacement events.
This pattern of traditional bilingualism or trilingualism, including widespread intermarriage customs, is common among traditionally allied Northern Siberian groups – related to shared subsistence economy rather than language family – and frequent language change is also well known from the historical period, including Samoyedic, Tungusic, and Mongolic groups (Khanina & Meyerhoff 2018, Khanina & Koryakov 2018).
1.2.5. Turkic, Mongolic & Tungusic
There is a growing corpus of appealing Proto-Turkic etymologies of Proto-Samoyed words, i.e. loanwords showing regular features of inner Samoyed development (Piispanen 2018). The approximately 30 reliable borrowings are thus comparable to the earlier Pre-PSmy.-PYuk. contacts, but with a reversed direction of influence, suggesting that Samoyeds had already become the ‘local’ culture under pressure from incoming nomadic elites:
- Horse-riding nomads:
- PSmy. *juntз ‘horse’ ← PTk. *junt ‘horse, mare’;
- PSmy. *ki̮r ‘gray hair (of animals), light, white’ ← OTk. qïr (CTk. *Kï̄r) ‘grey, grey-haired, color of horse’s coat’;
- PSmy. *kåŋ ‘lord’ ← PTk. *kān ‘lord’;
- PSmy. *kil’ ‘sable’ ← OTk. kil ‘sable’;
- Subsistence economy:
- PSmy. *jür ‘fat’ ← PTk. *ṻř ‘id’;
- PSmy. *kåptə̂- ‘to castrate’ ← OTk. qaptï ‘to grasp with teeth or hands’;
- PSmy. *pə̑jkз ‘dried fish’ ← PTk. *bālik ‘fish’;
- PSmy. *wekänä~*wekзrз ‘sturgeon’ ← PTk. *bEkre ‘kind of sturgeon’;
- Trade:
- PSmy. *yam ‘to wander with a tent caravan’ ← OTk. yam ‘a posting station’;
- PSmy. *päjmå ‘boots’ ← OTk. poyma ‘felt boots’~ CTk. *baλmak ‘kind of shoes’;
- PSmy. *jemńə̂- ‘to patch, to mend’ ← PTk. *jama- ‘to patch’;
- PSmy. *jikå- ‘sow, sharpen’ ← CTk. *(h)ẹ̄jke- ‘sow, sharpen’;
- Probably also from contacts related to commerce:
- PSmy. *jür ‘100’ ← PTk. *jṻř ‘id’;
- PSmy. *tettə̑ ‘four’ ← PTk. *dört ‘four’;
- PSmy. *ker- ‘to enter’ ← OTk. kir- ‘to enter’;
- PSmy. *jokə̑-~*jok- ‘to become lost’ ← PTk. *jōk-a-l- ‘to be lost, to disappear’;
- Kinship:
- PSmy. *inä ‘elder brother’ ← OTk. ini ‘younger brother’;
- PSmy. *jekə̑ ‘twin’ ← PTk. *(h)ẹjkiř ‘twins’;
- Nature:
- PSmy. *kil’ ‘winter’ ← OTk. qïl ‘winter’;
- PSmy. *pə̑t- ‘sink’ ← PTk. *bat- ‘sink, drown, set (about sun)’;
- PSmy. *ke̮pu ‘wasp’ ← CTk. *Kapuŋ ‘bumblebee’;
- PSmy. *puro ‘gray, wolfgray, wolf-gray dog’ ← OTk. boro ‘gray’;
- PSmy. *ta(ə)j ‘branch, ast’ ← CTk. *dal ‘branch, willow’;
NOTE. The loan of horse-related vocabulary from Turkic is common in forest-steppe and southern taiga groups like Ob-Ugric, including Yeniseian, cf. Ket qo:n ‘horse’ ← PTk. *qulun ‘foal’.
Based on the many known cases in language contact-induced change of correlative [language A] superstrate/adstrate (lexical) ↔ [language B] substrate (morphophonosyntactical) influences, it would be conceivable that Turkic showed a strong correlative Samoyedic substrate. Interestingly, some of the often-cited typological similarities of Uralic and Micro-Altaic, like their agglutinative nature or vowel harmony, might have arisen late and spread through intense areal contacts from west to east, with an ultimate Uralic connection finding further support in population genomics of Altaic-speaking populations (see below Proto-Turkic Homeland).
There is a later layer of Mongolic borrowings into Samoyed dialects, reflecting a common fauna in the Altai-Sayan region (Piispanen 2019):
- PMng. *kerije ‘crow, raven’ → Written Mongolian kerije(n) → Selkup kerja ‘raven’;
- PMng. *kürene ‘ferret, weasel’ → Written Mongolian kürene → Selkup kury ‘ermine’;
- PMng. *sïnkor ‘falcon’ → Written Mongolian singqur → Selkup seŋkjata ‘hawk’;
Similarly, some Proto-Tungusic borrowings have been proposed, although their irregularities make them unconvincing (Janhunen p.c. in Saarikivi 2020: 49). Dubious examples of late dialectal borrowings include (Piispanen 2019):
- PTng. *ābu- ‘a kind of duck’ → Nenets ńabu ‘duck’, Yurats njawétjä ‘duck’, phonologically difficult to accept.
- PTng. *kukti ‘cuckoo’ (→ pre-Evenki) → Proto-North Samoyed *kukti ‘cuckoo’ → Nenets xutij; Nganasan kotï. The other direction of borrowing is possible.
- PTng. *kūku (~ *xūku) ‘swan’→ Nenets xoxorej, Yurats kugórre ‘swan’. No exact correspondence with Samoyedic forms, and a PTk. *Kugu ‘swan’ also exists.
- PTng. *pige ~ *piage → Pre-Ewenki *piɣen → Kamass phigije , possibly more attractive as inherited from PU *päke ‘a kind of bird of prey’.
- PTng. *kāŋgu → Nganasan kaŋgü’’o , difficult to identify a donor language.
Later Samoyed varieties, most notably Sayan Samoyed, show evidence of intense contact with modern Turkic, Mongolic, and Yeniseian (Ket, cf. Joki 1952), with some Southern Samoyed groups like the Koibal, Mator, Karagass, Soyots, the Taigi-Samoyeds, and most recently Khakass – being recently linguistically Turkicized (Helimski 1996).
1.3. Hydronyms & ethnonyms
Samoyed toponymy is not well studied (except perhaps for Selkup areas), but there are more than 50 traditionally described non-Turkic names of water bodies considered of “Ugric Samoyedic” origin in Khakassia and neighbouring regions of Tuva, Kemerovo, and the Altai (Kyzlasov 1959: 73). Despite the general criticism by Dul’zon (1950, 1959, 1964), who deemed it best to try to etymologize non-Turkic names as Yeniseian first, there remains a considerable number of them with Samoyedic formants in -ba (-be), and with “Ugric” formants in -as, pointing to this South Siberian area in particular as an ancient East Uralic hotspot (cf. Kaksin 2018).
The Iranian-speaking area in contact with both Samoyed and Turkic is probably to be identified at least with formants in -lap, -lep, -rap, -rep, -rop (or in -man, -dan/-djan in the Upper Ob (cf. Kaksin 2019), among many others shared across Siberia which are difficult to etymologize as from a single language group (Maloletko 2005 – IV). Formants in -buj appear mainly in the Angara region (together with those in -chaga) and in the Circum-Arctic (mainly Upper Ob – Upper Yenisei) area, suggesting a connection with the Sixirtia that Samoyeds replaced on their expansion north (Maloletko 2005 – V).
The Proto-Samoyedic reconstruct for the Yenisei River, *jentəsi(-), was likely borrowed into Tungusic *jense(-gii), whereas PTk. *kem and PYen. *quk are arguably later. Given the lack of proper etymology of the word (or its division and meaning of ending -si-, if interpreted as a compound), its origin probably lies in the non-Samoyedic substratal language(s) of the Minusinsk basin (Janhunen 2012).
In a similar manner, a notable Samoyed toponymic layer lies in the Sayan area, including behind those of Turkic origin, although this is mostly a preliminary impression based on Helimski’s (1999) listing of toponymic details from Mueller’s manuscript on his 1739-1740 travels from Krasnoyarsk to the Steppes.