Sunday, 8 July 2018

A Closer Look at Branch B - new Y-DNA results

Since the last version of the "family tree" for the Lineage II Gleeson's of North Tipperary, there have been some additional results. For Branch B, these include two new members (G123 & G127), new STR results, and new Z255 SNP Pack results. This post assesses these new results, explores how they impact on the overall structure of Branch B, and draws conclusions about what the DNA and genealogical data in combination tell us about the members of this branch and how they are related to each other.

Below is the previous configuration of Branch B (from Aug 2017). There are 5 members. All share the SNP marker Y16880. And below that, their STR mutations (on the right side of each line) suggest a "best fit" branching structure that attempts to explain how the various people are related to each other. There is also a TMRCA estimate in red underneath each branching point. TMRCA stands for Time to Most Recent Common Ancestor and is expressed as the number of generations back to the most recent common ancestor. Thus the common ancestor for G107 (MPG) and G55 (HLG) is estimated to be about 3 generations ago.

The previous structure of Branch B from version 3
of the Mutation History Tree (Aug 2017)
(click to enlarge)

What do we know from each family's genealogy?

Let's first take a look at the direct male line pedigrees for each of the individuals in Branch B, including the two new members (G123 & G127). A big thank you to the project members for supplying this essential information (most of which you will find on the Post Your Pedigree page).

The direct male line pedigrees of the members of Branch B
(click to enlarge)

There are 4 lines within Branch B and what is particularly important is the birth location of the MDKA (Most Distant Known Ancestor) of each line:
  • Line 1 ... Massachussetts, USA
  • Line 2 ... probably Tipperary, Ireland
  • Line 3 ... Ireland
  • Line 4 ... Tipperary, Ireland

We assume that the MDKA information in these pedigrees is correct, but it may not be. When records are scant (as happens beyond 1830 with Irish records), oftentimes the best we can do is make an educated guess regarding the approximate year of birth and (most importantly) birth location of the MDKA. Nevertheless, it would appear that the Irish immigrant ancestor for Line 1 would have been born no later than 1 generation prior to the MDKA for line 1 (i.e. no later than about 1750 in Ireland).

Now let's take a look at the DNA.

What do the new DNA results tell us?

I have previously used a visualisation method for delineating the branching structure within the overall "family tree" for Lineage II, supplemented with insights from Dave Vance's SAPP Programme (which automates the process of generating Mutation History Trees (MHT) based on mutations in the Y-DNA SNP & STR markers). On this occasion, I started with the SAPP Programme and refined the inputs with each version of the MHT it produced. You can read a detailed account together with a sequence of diagrams later in this post, but below I merely include the top-line results.

The Z255 SNP Pack results of new member G123 (EMG) indicate that he shares 2 SNPs which until now have only been present in my Dad (G21, MHG). So this has now characterised a new branch within Branch B (indicated by the blue line in the diagram). This illustrates how the Z255 SNP Pack can (in certain circumstances) be a useful substitute for the Big Y test. However, it won't reveal any private/unique SNPs possessed by the tester.

This new sub-branch makes the connection between G123/G127 and G21 quite a way back (about 10 generations, which is about 300 years, which suggests a common ancestor born about 1700). And this also means that G123/G127 are connected to Line 1 (G57, G64 ,G55) a few generations further back than that (maybe 11, 12 or 13 generations, or 1600) … more than likely. You can read a more detailed account of the TMRCA estimates in the more technical section below.

A subsequent post will explore the use of autosomal DNA (e.g. Family Finder results) to help clarify the suggested relationships between the various members of Branch B. We would expect no atDNA matches between any of the 4 lines, but we would expect some matches among the members of each line in turn.

Figure 8: the final figure - Version 4 of the MHT for Branch B. This may be refined when Version 4 of the Mutation History tree for the entire group is generated.



A Detailed Account of the Technical Bits

For those willing to brave a more detailed account of the technical aspects of how the Mutation History Tree for Branch B was generated, please knock yourself out below.

(click to enlarge)

Figure 1: this first version of a SAPP-generated Mutation History tree is based on STR values only, anchored by the group Modal Haplotype as a starting point. Note that known relatives are separated - G64 (LTL) belongs with G57 (RL) & G55 (HLG), and G123 (EMG) belongs with G127 (JG). This artificial separation of known family members may be due to the different number of STR markers compared (i.e. 37 vs 111).


(click to enlarge)
Figure 2: SNPs have been added to "anchor" the overall group. But this makes no difference at all because all the members of Branch B share the same SNP marker (Y16880). Note that the TMRCA estimates (Time to Most Recent Common Ancestor) suggest that the group has a MRCA born about 1800, but within the range of 1700 to 1950. These TMRCA estimates will always be inexact and unreliable, despite being statistically accurate.


(click to enlarge)
Figure 3: genealogical information is added, specifically the MRCAs (Most Recent Common Ancestors) - two of them for Line 1 (James1795 & Ben1889), and one for Line 3 (John 1887). And now the diagram begins to approximate what we know from the genealogy. The known relatives are correctly grouped together, and Line 2 (G107, MPG) and Line 4 (G21, MHG) are clearly identified as outliers.

But there are still potential shortcomings. The diagram suggests that Line 4 (G21, MHG) is more closely related to Line 1 (G55, G57, G64 = HLG, RL, LTL) than to Line 2 or 3, and this seems counterintuitive given the huge number of mutations Line 4 (G21, MHG) has compared to the other lines.

Part of the problem may be that I have generated these diagrams for Branch B in isolation from the rest of the branches within Lineage II. A different configuration might result if all Lineage II members were included in this exercise.

(click to enlarge)
Figure 4: having now included all Lineage II members in the analysis, a new diagram is generated which looks essentially the same as the one above in Figure 3 ... except that member G107 has been moved to a completely separate branch of the tree (far left), beyond Y16880. This is probably due to the fact that G107 (MPG) has only tested to 37 marker level whereas most others in Branch B have tested to 111 markers.

So are we happy with G107 (MPG) being so far removed? How certain are we that he is correctly placed in Branch B? Is there any evidence to suggest he is better placed where SAPP has placed him? There are no easy answers to these questions. The new SAPP diagram suggests he is more closely related to G113 (GD 5/37), G70 (GD 7/37) and G05 (GD 7/37) than he is to the other members of Branch B (GD 1-2/37). This is counterintuitive and could only be explained by a significant number of parallel and back mutation being present ... which may be the case - we simply don't know.

So for now, I am going to assume that G107 is in fact more closely related to Branch B members and I will force a stronger likeness to Branch B members by assuming that his 38-111 STR marker panel is exactly the same as other members of Branch B. So having copied and pasted the values for these missing markers into the programme, this is the next diagram we get ...

(click to enlarge)
Figure 5: And now G107 (MPG) has been placed back in Branch B (where he probably belongs). But we have had to "fool" the SAPP Programme by forcing him onto a branch that it didn't want to put him on. We could confirm that we have placed him correctly if G107 (MPG) was to do the Big Y, the Z255 SNP Pack, or the Y16880 single SNP test.

The diagram still does not look quite right - G21 (Line 4) is still placed uncomfortably close to Line 1 members (G55, G57, G64) and this remains counterintuitive, given that G21 (MHG) has 5 STR mutations below "Node #45", suggesting that it is quite distant from Line 1 members (the Genetic Distance to members G55, G57, & G64 is 8/111, 6/111, and 2/37 respectively). This becomes even more clear when we add in the private/unique SNPs that Branch B members possess (based on the three Big Y results from this branch). G21 (MHG) has 5 unique/private SNPs whereas G57 (RL) has 2 and G55 (HLG) has 1. This is illustrated in the diagram below.

(click to enlarge)
Figure 6: this is the final "best fit" diagram from SAPP. Or at least it was until I noticed that FTDNA have made a mistake with the "current terminal SNP" of new member G123 (EMG, stated to be Y16880 on the Results Page). Looking at the results of his Z255 SNP Pack, he is not only positive for Y16880 (the overarching SNP for Branch B), but he also tests positive for 2 of the private/unique SNPs of member G21 (MHG, my Dad)! And now we have a whole new configuration ...

(click to enlarge)
Figure 7: And this latest version of the SAPP-generated Mutation History Tree seems to be much more aligned to my gut feel. My Dad G21 (Line 4) has been clearly separated from Line 1 (G55, G57, G64), and has been realigned to be closer to Line 3 (G123, G127). G107 (MPG, Line 2) is now more closely aligned with Line 1 (which makes more sense based on their small values for Genetic Distance i.e. 1-2/37).

Furthermore, compared to the STR mutations in Version 3 of the Mutation History Tree (Aug 2017), the Figure 7 diagram above is an improvement.

Further minor amendments were made when this final SAPP version was compared to Version 3 of the MHT for Lineage II - see Figure 8 (above & below). I think the refinements make logical sense but I will review this again when we come to creating Version 4 of the Mutation History Tree for Lineage II.


Figure 8: the final figure - Version 4 of the MHT for Branch B. The branching structure generated in Figure 7 is retained and there are only minor differences in the placement of STR mutations.

The take home messages from this exercise are as follows:
  • SAPP is only as good as the data you put in
  • it works best with a mixture of SNP data, STR data, and known genealogical data
  • TMRCA estimates for the branching points in the tree are crude, and will always be crude no matter how advanced DNA technology becomes. Nevertheless, they can be a useful guide when interpreted with caution.
  • The "best fit" family tree that results from building a Mutation History Tree is only one of several different configurations. It may not be a true representation of reality. But it is a starting point for discussion and further investigation. It is likely to change as more people join this branch and more data (STR & SNP) is generated.

Dating the Branching Points

TMRCA estimates can be calculated in several ways:
  • using genealogical information
  • using FTDNA's TiP Report tool (the orange icon beside each of your matches)
  • other STR-based methodology (such as the one employed by the SAPP Programme, namely Ken Nordvedt's Interclade Ageing methodology)
  • SNP-based calculations (such as that used by YFULL, which works out as about 150 years per SNP)
As a genealogist, none of them will give you what you want, namely: exactly how many generations back is the common ancestor? The best you will get is a midpoint estimate surrounded by an unhelpfully large range. But that is all we will ever be able to do. Increasing the number of STRs used to 500 will help reduce the range, but it may still be several hundred years on either side of the midpoint estimate. And from a genealogical perspective, that is not what we want.

There is also the danger that a crude timescale will fit in with our preconceived ideas and we will "make the data fit the story we want to hear". So there are loads of caveats around TMRCA estimates. Don't trust them.

Having said that, they can be a useful guide.

So for calculating the TMRCA estimates for the new Branch B family tree, I have used genealogical information in the first instance (in green) coupled with the STR-based SAPP-generated TMRCAs (in red). This may be refined further when Version 4 of the Mutation History Tree is generated for the entire membership of Lineage II.

Note that the TMRCA estimates generated by SAPP are very different to the TMRCAs based on known genealogy:

  • SAPP estimates the TMRCA between G123 (EMG) and G127 (JG) as 6 (5-7) gens = 1800 (1750-1800). In fact, they are uncle & nephew and the TMRCA is actually 1.5 generations.
  • SAPP estimates the TMRCA for G55 (HLG) and the known uncle/nephew pair G57/G64 to be 0 generations (range 0-0) = 1950 (range 1950-1950). The known number of generations between them is 4 generations.


Dating the A13103/BY14188 branch

The TMRCA estimate of 10 generations for both the Y16880 branch and the downstream A13103/BY14188 branch is derived from the SAPP-generated tree (see Figure 7 above and extract below). This gives the estimated TMRCA as 10 generations within a range of 5-10 generations (about 1700, with a range of 1550-1800). 

TMRCA estimates generated by SAPP


This TMRCA estimate for the A13103/BY14188 branch is also supported by the fact that G21 (MHG) has 3 private SNPs remaining that are still unique to him and no one else in the database (as yet). Allowing 150 years per SNP suggests that there is a 450 year period back to the MRCA for G21 & G123/G127. That takes us back to 1550. But caution is advised - the calculation is only based on 3 data points and could be out by several hundred years each way.

Using FTDNA's TiP Report tool, the TMRCA between G21 (MHG) and G127 (JG) at the 111-marker level gives a midpoint estimate of 9 generations (90% range 4-16 generations). Assuming the tester was born about 1950, and assuming 30 years per generation, this translates to a MRCA born about [1950-(30x9)] = 1680 (90% range  1470-1830). This estimate remains the same when adjusted for the minimum number of generations back to the common ancestor (based on known genealogies). This is a similar value to that generated by the SAPP Programme.

Dating the Y16880 branch

The various TMRCA estimates for the overarching Branch B-defining SNP (Y16880) are as follows:
  • 10 gens (range 5-14) based on SAPP
    • SAPP translates this as 1700 (1550-1800 AD)
  • 8 gens (90% range 4-15) based on TiP Report for G21 (MHG) & G57 (RL)
    • equates to 1710 (1500-1830)
  • 9 gens (90% range 5-15) based on revised TiP Report for G21 (MHG) & G57 (RL)
    • equates to 1680 (1500-1800)
  • 11 gens (90% range 6-19)  based on TiP Report (original & revised) for G21 (MHG) & G55 (HLG)
    • equates to 1620 (1380-1770)
  • 400 years ago (1600) based on SNPs and the average of the following:
    • 750 years ago based on SNPs for G21 (MHG)
    • 300 years ago based on SNPs for G57 (RL)
    • 150 years ago based on the single SNP for G55 (HLG)

Based on the TMRCA estimate of 1700 for the downstream A13103/BY14188 branch, it seems likely that the TMRCA estimate for the upstream Y16880 branch is likely to be several generations before this ... and this is supported by the SNP-based TMRCA estimate and 2 of the 3 TiP-based TMRCA estimates (which are all 111 marker comparisons).

Also, we need to bear in mind that there are 2 SNPs (A13103 & BY14188) between the A13103/BY14188 branch and the upstream Y16880 branch. And allowing for 150 years per SNP, this suggests that the Y16880 branch could be 300 years older (i.e. about 1400). Again, we need to be cautious about over-interpreting a result based on 2 datapoints.

In summary, the preponderance of the evidence suggests a date of about 1700 for the birth of the common ancestor of the A13103/BY14188 branch and a date of about 1600 for the common ancestor of the Y16880 branch.

Maurice Gleeson
July 2018