All Ancient DNA Dataset

Miscellanea Population Genomics All Ancient DNA Dataset

Viewing 6 posts - 41 through 46 (of 46 total)
  • Author
    Posts
  • #34321
    Carlos QuilesCarlos Quiles
    Keymaster
    • Topics: 51
    • Replies: 83

    Updated to version 2.04.53, including new data from (and updates to previously reported) early farmers of Anatolia, South-East and Central Europe, from Marchi et al. bioRxiv (2020).

    #34495
    Carlos Quiles
    Keymaster
    • Topics: 51
    • Replies: 83

    Updated to version 2.04.68, including the changes reported for Uyelgi samples, as well as other FTDNA Haplotree (provisional) assessments, like:

    • Kostenki14, which splits hg. C1b-B66 with another FTDNA customer.
    • SG41, which splits hg. D1a-BY12975 with another FTDNA customer from Kazakhstan.
    • Samples from Yu et al. Cell (2020), such as UKY001 and probably KAG001, GLZ002, which will probably split hg. C2a-BY728.
    #34989
    Carlos Quiles
    Keymaster
    • Topics: 51
    • Replies: 83

    Release of version 2.05:

    I have been updating the Ancient DNA Dataset, with some global additions, clearly enough to change version number. In particular, these are the columns added (or those I consider likely or possible to be added):

    FTDNA-Y-Haplotree: for FTDNA Y-Haplotree Y-names. I hope that sorting the file following their SNP order will help clarify the actual position of each ancient sample in their respective haplogroup branches.

    As you might have noticed, I am also shifting the “main” original column, YTree, to an FTDNA-friendly naming system. Naming consistency was becoming an issue, since many samples have now a depth that cannot be followed with either ISOGG or YFull.

    NOTE. For the moment, though, I am wary of changing the subclade naming for certain haplogroups. For example, haplogroup J – for some reason – appears to have an important user base in YFull which encourages the addition of ancient samples to their YTree. Anyway, it looks as though in the near future, when all ancient samples get fully analyzed and published by FTDNA, the whole haplogroup naming ecosystem will possibly be dominated by FTDNA.

    Y-SNP: I am now selecting only SNPs approved by FTDNA, so as to avoid the many dubious SNPs described by other companies and individuals but not fully accepted by others. Nevertheless, a proper terminal SNP (with negative and dubious ones) needs a manual check, and (unless you are Michael Sager) this is an impossible task for one person. Also, I am not well-versed in most subclades, and a certain experience with ancient and modern samples is needed when it comes to assess which derived and ancestral downstream calls are more likely to be correct. I will be posting links to the files, including pathPhynder’s estimation, apart from including as many alternative Responsible-SNP sources, to strengthen the reliability of each call.

    Isotopes: Basically, whether the sample is considered local or non-local, not necessarily the specific isotopic values, which might increase the file unnecessarily.

    Skeletal-Element: Will NOTE be included, for the moment. I am not convinced that a column with bone type (or other sample origin) is useful for this ancient haplogroup compilation, except maybe for statistical analyses. For the moment, I prefer not to increase the file size.

    Data-Type: Ditto. Furthermore, by following the current Reich Lab’s naming standard (adding .SG or .DG) I think this information is mostly included in the Object_ID of the samples relevant for genome-wide analyses.

    Qualitative Assessment/Confidence of archaeological and chronological contextualization for the genetic data of an individual: Very useful new columns added currently and for the past (2?) years by the Reich Lab. Since most samples offer reliable results, only some offer doubts, and a few have alerts, it seems like the most economic choice, I am not sure if only doubts and alerts should be added to the final column, reserved for “site”, which seems like the most economic choice. Until the next release of the Reich Lab curated Dataset, I don’t think I will make a decision on this.

    In general, I will try to keep up with the Reich Lab’s Dataset naming changes, to make both compatible and easy to combine when performing formal stats, even though their slow pace of corrections (and radical naming changes from the first to the second version released) suggest that those conventions might not be valid for long.

    #34994
    Carlos Quiles
    Keymaster
    • Topics: 51
    • Replies: 83

    Version 2.05.07 includes recent samples from:

    Other papers like Moussa et al. (2021) and others with few samples – to see the whole list of new samples since your last downloaded version, order the spreadsheet by date (second-to-last column).

    #35401
    Carlos Quiles
    Keymaster
    • Topics: 51
    • Replies: 83

    Updated to version 2.05.21, including:

    Minor changes, like the update of I6561, the Alexandria sample of hg. R1a-Y3, dated supposedly ca. 4000 BC, but now corrected in the AADR based on genetic data (as I suggested to the authors here):

    Context: Layer date based on 6 20-28 cM IBD individuals with Srubnaya/Alakul/Kazakhstan_MLBA individuals from 3900-3400 [based on these genetic results we ignore the direct date of 4153-3970 calBCE (5215±20 BP, PSUAMS-2832) from same site calibrated as 95.4%; IntCal20, OxCal v4.4.2 Bronk Ramsey (2020)

    #35914
    Carlos Quiles
    Keymaster
    • Topics: 51
    • Replies: 83

    Updated to version 2.05.75 (There have been other intermediate versions published with some of these updates):

    • New rules for access to Y-SNP files: Now fully restricted to reliable users; bots are forbidden.
    • I have checked new batches of samples for SNP calls from the FTDNA Haplotree, including Allentoft et al. (2015), Mathieson et al. (2015) and (partially) Mathieson et al. (2018), Damgaard et al. Nature (2018), and Jeong et al. (2020).
    • Added links to Y-SNP calls from Olalde et al. (2018) and Olalde et al. (2019). Currently working on Damgaard et al. Science (2018).

      The new color codes are intended to immediately convey information visually about recent Y-SNP updates (2021):

    • light green background: Those checked by me, in contrast with those in green background with the ‘seal of approval’ of FTDNA or YFull.
    • estimations bold: those calls considered estimations by me (due e.g. to lack of intermediate SNPs, or unreliable derived on ancestral SNP calls subject to deamination).
    • Strikethrough: in the “responsible” column, whenever the previous call is corrected (not just updated to a more specific subclade, which remains underlined).
Viewing 6 posts - 41 through 46 (of 46 total)
  • You must be logged in to reply to this topic.