› Miscellanea › Population Genomics › All Ancient DNA Dataset
Tagged: ancient, ancient DNA, database, dataset, DNA, haplogroup, mtDNA, samples, SNP, spreadsheet, subclade, Y-chromosome, Y-DNA
- This topic has 49 replies, 3 voices, and was last updated 1 year, 7 months ago by .
November 24, 2020 at 4:29 pm #34321Carlos QuilesKeymaster
Updated to version 2.04.53, including new data from (and updates to previously reported) early farmers of Anatolia, South-East and Central Europe, from Marchi et al. bioRxiv (2020).December 3, 2020 at 11:33 am #34495Carlos QuilesKeymaster
Updated to version 2.04.68, including the changes reported for Uyelgi samples, as well as other FTDNA Haplotree (provisional) assessments, like:
January 16, 2021 at 3:19 pm #34989Carlos QuilesKeymaster
- Kostenki14, which splits hg. C1b-B66 with another FTDNA customer.
- SG41, which splits hg. D1a-BY12975 with another FTDNA customer from Kazakhstan.
- Samples from Yu et al. Cell (2020), such as UKY001 and probably KAG001, GLZ002, which will probably split hg. C2a-BY728.
Release of version 2.05:
I have been updating the Ancient DNA Dataset, with some global additions, clearly enough to change version number. In particular, these are the columns added (or those I consider likely or possible to be added):
FTDNA-Y-Haplotree: for FTDNA Y-Haplotree Y-names. I hope that sorting the file following their SNP order will help clarify the actual position of each ancient sample in their respective haplogroup branches.
As you might have noticed, I am also shifting the “main” original column, YTree, to an FTDNA-friendly naming system. Naming consistency was becoming an issue, since many samples have now a depth that cannot be followed with either ISOGG or YFull.
NOTE. For the moment, though, I am wary of changing the subclade naming for certain haplogroups. For example, haplogroup J – for some reason – appears to have an important user base in YFull which encourages the addition of ancient samples to their YTree. Anyway, it looks as though in the near future, when all ancient samples get fully analyzed and published by FTDNA, the whole haplogroup naming ecosystem will possibly be dominated by FTDNA.
Y-SNP: I am now selecting only SNPs approved by FTDNA, so as to avoid the many dubious SNPs described by other companies and individuals but not fully accepted by others. Nevertheless, a proper terminal SNP (with negative and dubious ones) needs a manual check, and (unless you are Michael Sager) this is an impossible task for one person. Also, I am not well-versed in most subclades, and a certain experience with ancient and modern samples is needed when it comes to assess which derived and ancestral downstream calls are more likely to be correct. I will be posting links to the files, including pathPhynder’s estimation, apart from including as many alternative Responsible-SNP sources, to strengthen the reliability of each call.
Isotopes: Basically, whether the sample is considered local or non-local, not necessarily the specific isotopic values, which might increase the file unnecessarily.
Skeletal-Element: Will NOTE be included, for the moment. I am not convinced that a column with bone type (or other sample origin) is useful for this ancient haplogroup compilation, except maybe for statistical analyses. For the moment, I prefer not to increase the file size.
Data-Type: Ditto. Furthermore, by following the current Reich Lab’s naming standard (adding .SG or .DG) I think this information is mostly included in the Object_ID of the samples relevant for genome-wide analyses.
Qualitative Assessment/Confidence of archaeological and chronological contextualization for the genetic data of an individual: Very useful new columns added currently and for the past (2?) years by the Reich Lab. Since most samples offer reliable results, only some offer doubts, and a few have alerts, it seems like the most economic choice, I am not sure if only doubts and alerts should be added to the final column, reserved for “site”, which seems like the most economic choice. Until the next release of the Reich Lab curated Dataset, I don’t think I will make a decision on this.
In general, I will try to keep up with the Reich Lab’s Dataset naming changes, to make both compatible and easy to combine when performing formal stats, even though their slow pace of corrections (and radical naming changes from the first to the second version released) suggest that those conventions might not be valid for long.January 16, 2021 at 8:01 pm #34994Carlos QuilesKeymaster
Version 2.05.07 includes recent samples from:
- Kılınç et al. Science Advances (2021), whose Y-SNP calls I reported here, and which have been confirmed by YFull by including them in their respective trees (although it remains unclear which ones will split branches, until another related sample gets published or a modern relative is found).
- Samples from the recent Egfjord et al. PLoS One (2021), pending a proper analysis of the BAM files.
- Samples from the soon to be published paper on BA cultures of the Aegean Sea, including Steppe-related admixture in Helladic MBA.
Other papers like Moussa et al. (2021) and others with few samples – to see the whole list of new samples since your last downloaded version, order the spreadsheet by date (second-to-last column).January 23, 2021 at 8:47 pm #35401Carlos QuilesKeymaster
Updated to version 2.05.21, including:
- Updated Y-DNA from Egfjord et al. PLoS One (2021), with analysis of BAM files.
- Updated Y-SNP calls from Saag et al. (2021), with analysis of BAM files
- New links to files from Olalde et al. (2018)
Minor changes, like the update of I6561, the Alexandria sample of hg. R1a-Y3, dated supposedly ca. 4000 BC, but now corrected in the AADR based on genetic data (as I suggested to the authors here):
Context: Layer date based on 6 20-28 cM IBD individuals with Srubnaya/Alakul/Kazakhstan_MLBA individuals from 3900-3400 [based on these genetic results we ignore the direct date of 4153-3970 calBCE (5215±20 BP, PSUAMS-2832) from same site calibrated as 95.4%; IntCal20, OxCal v4.4.2 Bronk Ramsey (2020)February 8, 2021 at 11:36 am #35914Carlos QuilesKeymaster
Updated to version 2.05.75 (There have been other intermediate versions published with some of these updates):
- New rules for access to Y-SNP files: Now fully restricted to reliable users; bots are forbidden.
- I have checked new batches of samples for SNP calls from the FTDNA Haplotree, including Allentoft et al. (2015), Mathieson et al. (2015) and (partially) Mathieson et al. (2018), Damgaard et al. Nature (2018), and Jeong et al. (2020).
- Added links to Y-SNP calls from Olalde et al. (2018) and Olalde et al. (2019). Currently working on Damgaard et al. Science (2018).
The new color codes are intended to immediately convey information visually about recent Y-SNP updates (2021):
April 9, 2021 at 1:24 am #37359Carlos QuilesKeymaster
- light green background: Those checked by me, in contrast with those in green background with the ‘seal of approval’ of FTDNA or YFull.
- estimations bold: those calls considered estimations by me (due e.g. to lack of intermediate SNPs, or unreliable derived on ancestral SNP calls subject to deamination).
Strikethrough: in the “responsible” column, whenever the previous call is corrected (not just updated to a more specific subclade, which remains underlined).
Recent changes leading up to the current version 2.06.160:
- Full update with Reich Lab’s curated dataset. More info here.
- Y-DNA Scythians and other Iron Age nomads from Gnecchi et al. (2021).
- Y-DNA reported by FTDNA for new shotgunned sequences of already available samples. Detailed info here.
Today I added FTDNA’s assessment of the Y-DNA of Peder Winstrup from Krzewińska et al. (2021).
Also updated are the ADMIXTURE values, including the new samples from Gnecchi et al. (2021), and experimenting with the SE Asia proxy: now the reference is Thailand LN_BA rather than Papuan.
All files (including PDFs) updated and uploaded.May 18, 2021 at 11:27 pm #38583Carlos QuilesKeymaster
New version 2.07, now adding an inverted Formation-Age Ratio (FAR) applied to Y-SNPs and mt-SNPs, as a measure of time-related precision of the terminal SNP: the closer the value is to 1, the closer the formation date is to the ancient sample’s (radiocarbon vs. contextual) date.
This metric was proposed by Jari Kinnunen (from haplotree.info), and estimates are based on some relatively recent YFull formation dates adapted to FTDNA’s Y-DNA Haplotree at SNP Tracker.
[We are still waiting for FTDNA’s own estimations to be published, as recently announced]
Changes from the previously published 2.06.209 also include new SNP inferences, especially from the R1a (mainly Z93) and R1b branches (mainly P312).
The spreadsheet is up to date with the most recent reports of ancient samples.August 12, 2021 at 4:31 pm #38921JvdParticipant
Do you have any idea when the next update of the ancient spreadsheet/map will be available?
JanNovember 11, 2021 at 2:31 pm #39013bceParticipant
could ancient Y-DNA from these studies be added to the dataset?
https://dspace.cuni.cz/handle/20.500.11956/31423 (medieval Czech, data on page 83 and 84 of “text prace”)
I apologize if it’s already there.
- You must be logged in to reply to this topic.