Tuesday 15 August 2017

Version 3 of the Mutation History Tree for Lineage II

Below is the updated version of the Mutation History Tree for Gleeson Lineage II (the North Tipperary Gleeson's). Previous versions were published in December 2015 (version 1) and December 2016 (version 2). A pdf version of the tree can be downloaded from Dropbox via this link ... L2 MHT v3a

To see where you sit in the tree, find your G-number from the table at the bottom of this post (taken from our WFN Results page).

So what does it tell us?
  • The Gleeson Lineage II family tree currently has 11 major branches. And there are likely to be a lot more.
  • It looks like the Gleeson surname has been around for quite some time. The first branch to branch off was Branch F (far right). This is a pretty ancient branch and dates (very roughly) from about 1050 AD, not to far away from the presumed date of origin of the Gleeson surname.
  • There are probably some branches that have simply died out over the passage of time ... and what we are seeing here is simply a modern day snapshot of the remains of the "clan" that once was. In times past, some of the branches might have been much more prominent, and others much less prominent.
  • Age estimates of the branching points are very crude because the dating methodology has severe limitations. It is hoped that these can be improved with time.
  • Some branches are associated with a particular area or townland in North Tipperary (e.g. Branch C1 - Garryard; Branch E - Curraghneddy). It is hoped that as more people join the project and supply their MDKA information (particularly birth location) that more and more branches will be associated with specific locations. This in turn will help members with their individual genealogical research.

The Tree
The Pedigrees (and Key)
The (previously) Unique SNP markers
Click to enlarge ... or download the high-quality pdf version

The tree consists of several parts:
  • the tree itself, illustrating the branching pattern based on SNP & STR marker data
  • the pedigrees associated with each member in the tree (plus details of their MDKA / EKA)
  • a key to the tree, and numbered footnotes
  • the unique SNPs identified for those members who undertook the Big Y test

The tree has expanded considerably since the last version. The results of the tenth Big Y test are now included (from our Clan Gathering Chairman, Michael G. Gleeson). These came back from the lab in late December 2016 and underwent additional analysis by Alex Williamson for inclusion in the Gleeson portion of his Big Tree. These results confirmed the existence of Branch F (which had previously been merely predicted to exist on the basis of STR marker data). They also split up the "A5629 SNP Block" which up to that point consisted of 4 SNP markers. Thereafter it was split into an upstream branch characterised by the SNP A5631, and a downstream SNP block characterised by the 3 SNPs A5627, A5629, & A5630.

These results made A5631 the apparent over-arching Gleeson-specific SNP for Lineage II (i.e. only Gleeson's have been discovered to share this particular SNP marker). Thus, A5631 could be the DNA marker that defines membership of the larger Gleeson "Clan".

Lineage II Gleeson's on the Big Tree illustrating the old "A5629 SNP Block"
(from Nov 2016)
The current version of the Lineage II Gleeson portion of the Big Tree
showing how the previous A5629 SNP Block is now split in two (Aug 2017)

In addition to the 10th set of Big Y results, fifteen people expressed an interest in doing the new Z255 SNP Pack and the results of 13 of these people have now come back from the lab. This revised SNP Pack contains almost 50 SNP markers that are either shared only by Lineage II members or are unique to Lineage II members, and represents over 95% of all shared and unique Lineage II-specific SNPs (see this previous blog post). So the Pack is very specific for Lineage II. These SNP markers were identified via the 10 Big Y tests previously undertaken by our project members and were incorporated into the revised SNP Pack by the team at FTDNA.

A review of some preliminary results of these SNP Pack tests was discussed in a previous blog post. The updated results are included in a table at the bottom of this post.

The data from these 13 sets of new results have been added to the tree and as a result, the branching pattern has expanded considerably. The previous version of the tree consisted of 6 branches (known or predicted) but the new version contains 11 branches:
  • Branch A has been split in two (A1 & A2) and two new members added (see red G-numbers: G95 & G113).
  • Branch B has remained intact and has gained a new member (G107).
  • Branch E was previously thought to be more closely related to Branch B but the new SNP Pack results indicate that it is in fact more closely related to Branch C. Thus Branch E's attachment to the tree has been changed.
  • Branch C has been split into two (C1 & C2) - the latter has gained a new member (G89) thanks to the new SNP Pack results.
  • Branch D has split into two also (D1 & D2). This is not a big surprise as the anticipated common ancestor of the original 2 members of this branch was some 14 generations ago. This branch has also gained a new member (G106) due to the SNP Pack results.
  • Branch F has also remained intact and has gained a new member (G104), again due to the new SNP Pack results. This is an unusual branch and appears to be the oldest branch in the project so far. Its connection to the rest of the group is some 30-32 generations ago, approximately 1050 AD, taking it very far back in time, almost to the predicted origin of the Gleeson surname.
  • Branch G is a new branch within the tree. It consists of just two people and they are not particularly closely related to each other. Both tested with the new Z255 SNP Pack and only tested positive for the more upstream Lineage II SNP markers (A5631 & the A5627/29/30 SNP Block). This too is a relatively old branch and its connection to the rest of the tree is some 25 generations ago (about 1200 AD). 
  • Branch H is also a new branch and may be a similar age to Branch G (i.e. about 25 generations ago). However, the members of this branch have tested positive for marker BY5706 (which is one step further downstream than Branch G). None of the 4 members in this branch are particularly closely related, so I would expect this branch to split up into further sub-branches in due course.

Version 1 of this Mutation History Tree contained 16 of the project members of Lineage II, version 2 placed 20 of the 31 members (65%) on the tree, and version 3 is the most comprehensive to date and contains 32 of the 36 members currently in Lineage II (89%).  The remaining 4 members cannot be placed with reasonable accuracy and will require further testing to enable placement.

Altogether, of the 36 members in Lineage II, 23 (64%) have downstream SNP data available - 10 via the Big Y test, and 13 via the new Z255 SNP Pack. The SNP Pack proved to be a great success and an 89% placement rate is quite impressive. The placement rate increased from 65% to 89% as a result of the SNP Pack testing.

Interestingly, some members were sufficiently closely related to other members of the group that SNP testing was not necessary. In some cases a definite relationship was already known, and in other cases the STR-based Genetic Distance was sufficiently close that placement was possible ... with reasonable confidence. The caveat here is that there may be a degree of Convergence obscuring the true relationship between certain members. And as a result, some people who have not undergone SNP-testing may need to be moved onto a different branch in the future. 

There were several questions that I had hoped the revised Z255 SNP Pack testing would answer:
  1. Are 10 Big Y tests enough to identify all/most of the downstream SNPs associated with Lineage II?
  2. How many future members are likely to be placed on the tree by just using the revised Z255 SNP Pack?
  3. Will there be a need for future Big Y testing within the group? or has the testing undertaken by group members so far helped reduce the cost for future members?

Now that the results of the SNP Pack testing are in, we can look at these questions one by one and see to what extent we have an answer for each.

The 10 Big Y tests certainly did identify a lot of the downstream branches of the tree, but not all of them. If we take "downstream" to mean (crudely) less than 18 generations ago (i.e. less than 600 years), then between the Big Y testing and the Z255 SNP Pack testing, six (6) downstream branches were identified (Branches A1, B, E, C1, C2, F). The remaining 5 branches did not have a "sufficiently downstream" SNP identified (Branches A2, D1, D2, G, H).

Also, the exercise identified new branches that were not predicted from the original Big Y testing. It is therefore likely that additional new branches will continue to be identified over time as more people join the project and undertake SNP testing.

So, although the SNP Pack testing did provide a lot of additional useful information, and has improved the structure of the Mutation History Tree, its coverage of "downstream SNPs" (using the arbitrary threshold of approximately 18 generations) is only about 50%. This fact alone indicates that there will be a need for Big Y testing in the future, but perhaps much more selectively (thus saving money for project members).

Now that the structure of the tree is quite developed, and as it continues to "mature", it will become easier and easier to place future members on a particular branch of the tree and in many instances will obviate the need for SNP testing. At this stage it is difficult to know how often this will happen.

For future members who are not easy to place, the options will be a 67 or 111 STR upgrade, the Z255 SNP Pack, or the Big Y test. In most instances, the SNP Pack might be the test of choice if the new member appears to be a possible match to one of the "downstream branches". But if it is not possible to place the new member anywhere on the existing tree, then the Big Y test might be preferred.

Accurately dating when each branch arose remains a problem and there are several reasons for this:
  1. In order for the dating to be accurate, the branching structure must be accurate. And for some people there is insufficient data to place them confidently on the tree. In such cases, it may be necessary to upgrade to 67 or 111 STR markers, or do the Z255 SNP Pack, or do the Big Y test.
  2. There is an inherent problem with any dating methodology used. Statistically, it may produce very accurate results. But from a genealogical perspective, the results are very inexact. Even at the 111 STR marker level, one can often expect to find a range of +/- 300 years on either side of the midpoint estimate. The same is true for dating using SNPs. 
  3. Dating using STRs appears to work best for people who are relatively closely related (say within the last 500 years) and dating using SNP markers may be more "exact" for people who are related 500-1000 years ago. Only further research will help clarify this.
  4. FTDNA's TiP tool uses proprietary information and its methodology is not public knowledge. As a result there is no way of checking the science behind it. It may be that it's estimations are incorrect. Last year (2016) the algorithm's were adjusted and new TMRCA estimates were generated for the same results. But there is no way of knowing if this was an improvement or not. I suspect that the TiP tool may underestimate the age of more distant (upstream) branching points because it does not accurately take into account the extent of parallel and back mutations. 
  5. Dave Vance's SAPP programme uses Ken Nordtvedt's Interclade Ageing methodology. I don't know much about this method but it may be a better way of using STR data to estimate TMRCA. And as the SAPP Programme is automated, it takes a lot of the hard work out of the calculations. Potentially.
  6. Ultimately, dating the branching points will involve a mixture of the above techniques and the best that can be achieved may simply be a "best guess".

So the take home message is that all time points in the tree should only be taken as a very rough guide.

As the tree grows and expands, more and more people will be able to use it to help their own genealogical research. Already we are making connections and breaking down Brick Walls for members in Branches B and C1. 

More will follow in time.

Maurice Gleeson
Aug 2017

The members of Gleeson Lineage II (from the WFN Results page)
... find your G-number above and then locate yourself on the tree

Below is the revised spreadsheet of the results of the recent Z255 SNP Pack testing. The previous blog post only included 12 sets of results - the 13th set of results effectively split Branch D into two separate branches.

click to enlarge ...
or download a high-quality pdf version
via this Dropbox link here


Note that some SNP markers have more than one name (e.g. A5631 is also called Y17108). This confusing situation arises because different institutions give the same SNP different names. The best place to see which SNPs have alternative names is to go to the Gleeson portion of the tree on YFULL. Just search for A5631 (use Cmd+F on a Mac or Ctrl+F on a PC). Note that the YFULL tree does not have as many datapoints as the Big Tree or FTDNA's haplotree.