Saturday 16 May 2020

Methodology behind the Mutation History Tree v4 for Lineage II

In the previous post I introduced the new version of the Mutation History Tree (MHT) for Lineage II (the North Tipperary Gleeson's). The previous version of the MHT for Lineage II participants was back in August 2017 (see blog post here) so let's take a look at what has changed since then, how the new version was put together, and what it tells us about the North Tipperary Gleeson's.

You can download a high resolution pdf version of the new MHT (version 4) here ...

There are now 52 members in Lineage II, an increase of 16 members since the last version of the MHT. The data used in each version of the MHT are summarised in the table below. You can see how the available data has grown over the past 5 years.

We use a combination of STR marker mutations, SNP marker mutations and user-submitted genealogical information to build the tree. The Big Y-700 provides the most comprehensive information for this analysis - it assesses 838 STR markers and over 200,000 SNP markers. Only 4 people have done this test, but 8 people have done the previous version of the Big Y (the Big Y-500) and this assessed 561 STR markers and over 100,000 SNPs (roughly). It is difficult to know if Big Y-700 data will provide additional differentiating information over and above the data from the Big Y-500.

For this fourth version of the MHT, I analysed the data from scratch so that I could compare it with previous versions to see if and where I had done things differently. I also experimented with a different approach that generated a different format of the MHT - this has pros & cons compared to previous formats. Let me take you through it step-by-step.

The new format consists of 5 sections that will look very familiar to you as it is based on the Results Page for the Gleason DNA Project on the FTDNA website:
  1. Personal Details
  2. Genealogical Information
  3. SNP Marker data
  4. STR Marker data
  5. Mutation History Tree 
A pdf version of the entire MHT v4 can be downloaded from this Dropbox link here. This is the best way of viewing the MHT and should be referenced as you read the text below.

1) Personal Details consist of the FTDNA kit number, the old G-number (a numbering system we used when the project ran on the now defunct WorldFamiliesNetwork website), surname of the test-taker, and whether the test-taker has their STR results on the public Results Page (anyone who doesn't has their kit number & G-number blanked out). If you want to change your settings, just let me know. Or you can change them yourself by going to Account Settings > Project Preferences > Project Sharing > Group Project Profile and click / drag the slider to Opt in to Sharing.

Each branch of the Tree is indicated by a coloured border and highlighted background (green in the example below, indicating Branch F).

2) Genealogical information is whatever details have been submitted by the user. All members should have posted their direct male line pedigree on our Pedigrees Page. One of the most important pieces of information is frequently missing, and that's the ancestral location of the MDKA (Most Distant Known Ancestor). This should preferably be where he was born, but failing that his marriage location or where his children were born can be used as a substitute. This information may help us identify if a specific sub-branch comes from a particular area (and that in turn may help people in their documentary research).

Abbreviated Direct Male Line pedigree information is included in the pdf document, along with MDKA info (under FTDNA's heading "Paternal Ancestor Name"), and an additional column specifies the Paternal Ancestral Location (which lists country, county, district and town or townland, extracted from the MDKA information).

Known cousins have the background highlighted in pastel colours. There are none in the first branch but 3 in the second.

(click to enlarge)

3) SNP Marker data includes the Terminal SNP for each member (under FTDNA's heading "Haplogroup"), details of any SNP test taken (SP, SNP Pack; BY500, Big Y-500; BY700, Big Y-700), the number of Unique SNPs and the associated SNP Sequence (or SNP Progression) - this is simply the sequence of SNP markers that characterises each branching point on the Tree of Mankind, starting at a distant "upstream" branch (in the past) and progressing all the way "downstream" (i.e. towards the present) to the Terminal SNP. Think of this string of SNPs as a line of ancestors coming forward in time until it reaches you in the present day. Comparing the SNP Sequences of two people helps us see exactly on which branches each person sits on the Tree of Mankind relative to each other. And this in turn tells us how closely or how distantly they are related to each other.

Each SNP in a SNP Sequence often represents several SNPs in a "SNP Block". Thus the SNP A5629 (for example) is actually the first SNP in a block of 5 SNPs, and the entire block takes its name from the first SNP - A5629. You can see these various SNP Blocks quite clearly in the Big Tree's version of the portion of the Tree of Mankind where the Lineage II Gleeson's sit.

The numbers in brackets after each SNP below represent the number of SNPs in that particular "SNP Block". FTDNA & The Big Tree have different methods for deciding how many SNPs there are in a SNP Block so you will find 2 numbers after each SNP below - the first is the Big Tree estimate and the second is FTDNA's (taken from their Big Y Block Tree). So for example, A5631 is a 1-SNP block according to the Big Tree and a 3-SNP block according to FTDNA. Knowing the number of SNPs in a SNP Block is helpful for calculating the age of a SNP Block (i.e. when was the first SNP in the block formed) ... so having different estimates for the number of SNPs in a block can lead to confusion and inaccurate age estimation (which is going to be very crude anyway).

Knowing the SNP Sequences allows us to build a SNP-based "family tree" (as in the diagram below, which is taken from FTDNA's public Y-DNA Haplotree). A5631 is the overarching SNP for the Lineage II Gleeson's - the common ancestor of all North Tipperary Gleeson's carried this DNA marker and passed it on to all his descendants. All group members sit on branches below this particular SNP.


4) The STR Marker Data is taken directly from the Results Page on FTDNA's website for the first 111 markers. Data from Big Y STR markers (markers 112 to 838) had to be downloaded from each individual members webpage. There are up to 838 STR markers, and in the first row, I have numbered them 1 to 838. (Note that any extra multi-copy markers are labelled with the previous number followed by a letter e.g. dys464e & dys464f become 25e & 25f - this preserves the numbering system for the subsequent STR markers up to 838).

Only those markers with a mutation are included - all others are deleted from this tabular summary. The names of each marker are in the 2nd row.

Numbering the markers (rather than writing out their full name, which becomes very cumbersome) makes it easier to locate specific markers within the MHT diagram and discuss them in the text below. For example, CDYa is marker 34.

I also include the modal haplotype for two "upstream" SNPs - Z16437 & Z255. These come before the overarching SNP for Lineage II (A5631). You can see this in the more extensive SNP Sequence for Lineage II below (taken from Rob Spencer's Admin Utilities tool on his Tracking Back website):
  • R-L21 > DF13 > ZZ10_1 > Z16423 > Z255 > Z16437 > BY2853 > Z16438 > BY2852 > A5631
Including these modal haplotypes allows us to see which STR mutations occurred prior to the emergence of the SNP marker A5631, and therefore which STR marker values were ancestral and which ones were descendant. This also identifies a Unique STR Pattern (USP) - relative to the Z255 modal - that applies to all Gleeson's below A5631. This is highlighted in light yellow (or is it beige?) and identifies the 5-marker USP for Lineage II as follows:
  • marker 9 (dys 439) changed from a value of 12 to 13 (written as 12 > 13)
  • marker 13 (dys 458) 17 > 16
  • marker 14 (dys 459a) 9 > 8
  • marker 23 (dys 464b) 15 > 16
  • marker 68 (dys 710) 36 > 37
Anyone who has this Unique STR Pattern (USP) is likely to test positive for the SNP marker A5631 and is also likely to be a Gleeson from North Tipperary. (Note that this USP is based only on the first 111 markers. Additional STR marker values may be present in STR markers 112-838 that could form part of the USP, but I was not able to locate the modal haplotype for these markers - it is not displayed on the FTDNA webpages and YFULL only has modal values for the first 111 STR markers, but excluding the multi copy markers - I had to obtain the latter from the relevant haplogroup projects).

As well as the USP that defines the A5631 group and distinguishes it from other groups, there are additional USPs within the group that define sub-groups (and hence sub-branches) among the North Tipperary Gleeson's. These sub-USPs are indicated by boxes with bold black borders. In the screenshot above, you can see such boxes around markers 2, 4, 14 & 15 (which have values of 23, 10 9 & 9). There is also a possible sub-sub-group indicated by the box encompassing the values for two men on marker 13 (which has a value of 17).

The resulting subgroups were checked against the genealogical data and corroborating genealogical evidence was found for the groupings on several sub-branches (e.g. Branches B & A1).

Again, each branch of the Tree is indicated by a coloured border (green in the example above, indicating Branch F), as well as a coloured box with the Branch Name within it.

5) The Mutation History Tree sits in between the SNP data and the STR data. Building the MHT proceeds in 3 stages:
  1. defining the sub-groups using SNPs, genealogical data, and USPs
  2. defining the branching structure: which STR mutation came first - the chicken or the egg?
  3. defining the dates for each branching point (using various iterations of SAPP)

Once I had obtained the data from the Results Page and put it into an Excel spreadsheet, I grouped people together according to their Terminal SNP. As a result, 28 of the 51 project members were sorted into 10 distinct groups. To these were added known relatives (who had no downstream SNP data) - this allowed an additional 8 people to be grouped (bringing the total to 36 out of 51). Lastly, I visually inspected the STR mutations to identify sub-groups with a USP and sorted an additional 7 project members accordingly (bringing the total up to 43 out of 51). The remaining 8 project members were placed as accurately as possible within the MHT, on branches where their genetic distance to their neighbours was minimal (lack of available data prevented more accurate placement).

Once we had our groups, the next step was to try to judge how they all fitted together. I examined each STR marker column by column, assessing any mutations to try to determine which came first in the sequence from upstream to downstream, using a "maximum parsimony" approach (i.e one that required the fewest number of mutations). I employed two exceptions to this general rule:
  1. I ignored mutations on markers 34 & 35 (CDYa & CDYb) because these are fast-mutating markers and their values can easily "flip-flop" back and forth from generation to generation.
  2. I avoided Back Mutations as these are rare compared to Parallel Mutations. Dave Vance has calculated that within the past 1000 years or so (which covers the era of surnames in Ireland), one would expect the ratio of Parallel Mutations to Back Mutations to be somewhere around 25:1 (see his article discussing this here). In other words, there should be 25 times more Parallel Mutations than Back Mutations in the MHT. And so I deliberately avoided assigning Back Mutations if at all possible. In fact, I only put in one such mutation (but this was more as a token gesture than anything else).
Some branches were easy to characterise, either because they had a lot of data (e.g. Big Y-700), or they had distinctive USPs, or both. For Branches H & G, it was much more difficult to elucidate a branching structure and thus the connections between people on these branches are unlikely to be accurate reflections of reality. Furthermore, the SNP markers that characterise these branches are quite far upstream - BY5706 goes back to a common ancestor who lived about 1250 AD and A5629 to an ancestor who lived about 1150.

More data (i.e. Big Y-700) for everyone on these branches would be needed in order to better characterise their relationship to each other. In addition, it was not possible to confidently allocate 4 people to any of the 11 named branches. And again, Big Y data would be needed to do so.

It is highly likely that some of the people on Branches H & G, and/or the 4 outliers, will belong to new (possibly isolated) branches of the MHT. Many branches will have become extinct over time and others will only have very few surviving descendants. And some surviving branches may not have anyone who has tested, and so they are not currently represented in the project.

Gleeson Lineage II MHT version 4 - download the MHT as a pdf document here
(click to enlarge)

Once the branching structure was defined, the next step was to date each branching point within the overall tree structure. Dave Vance's SAPP Programme was used to generate TMRCA dates for each branching point (TMRCA, Time to Most Recent Common Ancestor). This is never a straightforward task as there are many hurdles to overcome.

Firstly, the quantity of data for each project member is variable. Some have only tested 12-STR markers, others have tested 838 as well as 200,000 SNP markers. However there are some tricks to get around this:
  1. if close relatives have tested, the STR marker values for one relative can be extrapolated to the other. This is obviously not foolproof but it captures most of the relevant shared mutations and helps recognise USPs.
  2. if someone else has a similar USP to others who have tested to 838 STRs, the additional STRs can be imputed for the person with fewer STR markers tested and the missing marker values can be completed / filled in.
The SAPP Programme is very sensitive to data input - small changes in input can produce big changes in output. Several versions of the input file were created with varying degrees of data imputation and data suppression. The outputs (i.e. MHT) of sequential iterations of the input file were examined for consistency and differences. 

Here is a summary of each input file and at the end of this post are the SAPP-generated MHTs associated with each input file:
  • GL2 v4 ... no calls (-) replaced with "n" prior to pasting into txt file, 2 kits ignored (both above A5361), 464e&f were removed and modal values for the remaining dys464 markers (a thru d) were used for the 2 affected kits
  • GL2 v4a ... additional values for known relatives and USP-defined branches were imputed, extreme outliers were ignored (Y-12)
  • GL2 v4b ... Z16437 modal added (with values for dys464e & f deleted)
  • GL2 v4c ... ignore non-SNP & non-USP participants (mainly branches H&G)
  • GL2 v4d ... ignore CDYa & CDYb

Interestingly, once "hard-to-place" project members were removed, the central estimate (midpoint estimate) for each of the major branching points did not differ substantially between iterations. The most noticeable change was the 5th iteration (GL2 v4d - CDY markers ignored) which reduced the TMRCA by about 50-200 years for each branching point. 

The 4th iteration (using input file GL2 v4c) was taken to be the version most likely to be closest to the true TMRCAs. This had to be massaged slightly as the first 3 SNPs had the same TMRCA - A5631 was adjusted to 1100 (50 years earlier), A5629 stayed at 1150, and BY5706 was adjusted to 1250 (100 years later). Also, A13119 was adjusted up by 100 years from 1250 to 1350.

(click to enlarge)

Further problems arose when trying to calculate the formation date for some of the SNP markers, and in this regard we need to make several important points: 
  • It is essential to appreciate that the TMRCA estimate goes back to a common ancestor who was born with the particular SNP marker in question. In other words, the mutation did not arise in that ancestor - it arose in a previous generation before the birth of that common ancestor. It may have arisen in the ancestor's father or grandfather or 10 times great grandfather. Therefore the formation date (i.e the date when the SNP mutation emerged) is usually going to be older than the TMRCA, sometimes a lot older. The only exception is when the SNP mutation occurred in the common ancestor's father.
  • SNP Counting was used and 84 was taken to be the average number of years per SNP. 
  • If there is only 1 SNP characterising a branching point, then the formation date refers to just that single SNP. However, if there is a block of SNPs at a branching point, the formation date represents the formation date of the very first SNP in the block. Thus SNP marker A660 represents a block of 6 or 7 SNPs and even though the TMRCA is about 1700, the formation date is estimated to be about 1200 (i.e. 6.5 SNPs x84 = 546 years ... so 1700-546 is roughly 1200 AD).
  • SNP formation dates were crudely calculated from the TMRCA as the end point for the Block and the starting point as the number of the SNPs in the block x84 years. These crude formation date estimates were physically impossible in some instances because there were deemed to have occurred before the TMRCA of the preceding SNP. Therefore, such nonsensical estimates were constrained by the TMRCA of the upstream SNP which was deemed to be more accurate. Some of these estimates had to be constrained by 200-300 years.

As a result, a very crude timeline was generated for SNP formation dates and these need to be taken with a large pinch of salt (e.g. with average ranges of about +/-200 years, for example). Nevertheless, they provide an approximate timeline for the evolution of the MHT and add further interest to the picture.


The outcome of this new approach used for generating this version of the MHT did not differ substantially from previous outcomes. The same number of branches were identified and all the major branches were successfully characterised.

The format stays very close to the format of the Results Page on the FTDNA website and hopefully this helps project members better understand where the data comes from and how it is used to build the tree. It may even help software developers to automate the process (or parts of it at least).

Using successive iterations of the SAPP-generated tree allowed better age estimations for each of the branching points within the tree. There were significant problems using SNP Counting to estimate Formation Dates for the SNPs, but the TMRCA estimates seemed reasonably stable, especially when non-SNP & non-USP members were excluded from the SAPP analysis.

At the end of all this, we have a MHT that charts the evolution fo the Gleeson surname from a time close to its formation, through major periods in Irish history including the Norman Conquest, the Black Death, the English Plantations, Cromwell's Conquest, the End of the Gaelic Era, the Great Famine, and the large scale emigration that formed the present-day Irish Diaspora of 80 million people.

In some cases, project members within individual branches have managed to break through Brick Walls and establish relationships with other members on the same branch. A good example of this is the guest article by Lisa Little describing the connection between her Little line and the Gleeson's of Branch B.

However, the connections between the various branches are beyond the reach of surviving documentary records and sadly these ancestors are lost in the mists of time. All we have to show for them is the genetic legacy that they have passed on to their Gleeson descendants living today. But even this can help place new members on a particular branch of the "genetic family tree" for all North Tipperary Gleeson's and allow them to connect at least partially with the rich legacy of their ancestors.

Maurice Gleeson
May 2020

Below are the SAPP-generated MHTs associated with each input file. The branches are colour-coded to match the Branches of the MHT v4 ...

GL2 MHT v4 (Gleeson Lineage II Mutation History Tree version 4) - 50 kits
(click to enlarge)

GL2 MHT v4a - 42 kits (there's a glitch & some of the colours came out wrong)
(click to enlarge)

GLT MHT v4b - 42 kits
(click to enlarge)

GLT MHT v4c - 39 kits
(click to enlarge)

GLT MHT v4d - 39 kits
(click to enlarge)

Friday 15 May 2020

Overview of the Mutation History Tree v4 for Lineage II

The Gleeson's of Lineage II have origins in North Tipperary and all share a common ancestor about 1000 years ago. This common ancestor presumably also came from North Tipperary and was the first person to use the Gleeson surname in that particular area. But how are all these Gleeson's related to each other? What does the family tree for this entire group look like? This is where Y-DNA comes in. We can construct a "Mutation History Tree" (MHT) for the entire group by analysing the mutations in their Y-DNA markers. This tree stretches from the present day all the way back to the origin of the surname (about 1000 years ago) and back before that into the era of the Irish Clans.

The screenshot below is an overview of the MHT. However, it is best viewed by downloading the pdf version of the entire MHT v4 from this Dropbox link here. You can then read this text and refer to the pdf document when you need to see larger print. Alternatively, you can enlarge your webpage by holding down the Ctrl button on your PC keyboard (Cmd on a Mac) and clicking on the plus sign (+) button to enlarge the text or the minus sign (-) button to reduce it. Or you could open this post in two browser windows and flick back and forth between the text and the diagram below.

This Mutation History Tree (MHT) represents a "best fit" family tree for the available data. It probably does not reflect reality, but probably comes relatively close to it.

The MHT diagram below includes individual kit numbers & G-numbers on the left (anonymous kits are blanked out), followed by the branch name and then the branching structure of the MHT itself, starting at the present and going back in time from left to right. The numbers refer to STR markers that have experienced a mutation along a particular ancestral line. Parallel Mutations are in pink or blue, and the single Back Mutation is in red.

There is a very crude TMRCA timeline at the top (TMRCA = Time to Most Recent Common Ancestor) which is derived from various exploratory exercises with the SAPP Programme. All these technical details will be explained in a supplementary post and only the top-line conclusions will be presented here.

Gleeson Lineage II MHT version 4 - download pdf document here
(click to enlarge)

So what does the MHT tell us? Firstly, it points to a common ancestor for the entire group who lived about 1100 AD (+/- 200 years). He carried the SNP marker A5631 and passed it on to all his descendants.

Within a few generations (possibly around the time of the Norman Conquest of Ireland), at least 2 descendant branches emerged - one characterised by the SNP marker BY14197  the other by A5629  Both produced descendants that survive to the present day.
  • BY14197 gave rise to  Branch F .
  • A5629 gave rise to all the other named branches within the group.
There may be other "ancient branches" that remain to be identified - possibly 2 below A5631 and possibly 4 below A5629. Or alternatively, these branches (if they ever existed) have gone extinct.

Here is a short summary of each branch. Note that all dates are very approximate and most have a range of plus or minus 200 years:
    • Starting at the top,  Branch F  has 5 members. They share a common ancestor who lived about 1500 AD. This ancestor carried the SNP markers BY14197 & BY14189 and passed them on to the entire group. This branch includes the oldest ancestral line within the project - the participant bears the unusual surname variant CLESSON and the MDKA goes back to 1651 in colonial America.
  • Next, Branches B, A1 & A2 share a common ancestor who lived about 1450 AD and carried the SNP marker A5628 (which he duly passed on to all his descendants). 
    •  Branch B  consists of 7 members whose common ancestor lived about 1550 AD and carried the SNP Y16880.
    •  Branch A1  has 4 members whose common ancestor lived about 1700 and had the SNP A660.
    •  Branch A2  has 3 members whose common ancestor was probably Matthew Gleeson (born about 1805) or his father, who came from Clonlara in Co. Clare. They may share an as yet unnamed SNP marker.
  •  Branch E , the  C branches ,  D2  &  H  share a common ancestor who lived about 1250 AD and carried the SNP BY5706. By the time this branch emerged, the Norman's had a strong foothold in Ireland. This man had many descendant lines that survive to the present day, and among these are 4 distinct sub-branches.
    •  Branch E  & the  C branches  share a common ancestor who lived about 1500 AD and carried the SNP BY5707. This later gave rise to the 2 separate branches:
      •  Branch E  - common ancestor lived in the 1700s, possibly in or near Curraghneddy in Co. Tipperary
      • Branches C2, C1 & C3  - common ancestor 1550 (had BY5707)
        • Branch C2 - common ancestor 1650 (carried A13116)
        • Branch C1 - common ancestor 1650 (carried A13110)
        • Branch C3 - common ancestor possibly James J Gleeson (born 1844) 
    •  Branch D2  is another ancient branch. The common ancestor of the 5 members lived about 1350 AD, around the time of The Black Death. He carried the SNP A13119.
  • The members of  Branch H  (7 men) &  Branch G  (5 men) are difficult to place. Only 2 of the men have done the Big Y test and there is insufficient comparative data to allocate them to more specific sub-branches. It is possible that, even with Big Y data for all 12 men, some of them may sit on very isolated branches within the overall tree because they are only one of a few descendants who survive from that particular branch. In some cases, they may be the sole surviving member of a particular Gleeson branch. Only Big Y testing will help clarify this.

Prior to the overarching Gleeson ancestor (1100 AD), the North Tipperary Gleeson's shared a common ancestor (who had BY2852) with a group of neighbouring Carroll's, and before that they shared a common ancestor with a man whose descendants would later carry the names Bell, Phelps, McMahon & Prendergast (to whom he passed on Z16438). And earlier still, we shared a common ancestor who had Z16437 and his descendants include men called Miller, Treacy, McConnell, Hally, McCarthy & Cremen.

There is an interesting surname associated with  Branch F  (Carles) who tests positive for BY14197 but not the downstream SNP marker BY14189. I wonder if this man is a descendant of a Gleeson who emigrated to Spain some time after 1200 AD (possibly with The Wild Geese)? I will be making enquiries and will feed back in due course.

In the next post, I'll give a detailed description of the pdf document which includes all the data that went into creating MHT v4 and will explain the methodology and process of how the MHT was built.

Maurice Gleeson
May 2020

Wednesday 7 August 2019

A New Genetic Family is Born - Lineage VIII (North Tipperary)

Back in April this year, some new Y-DNA-37 results recently came back from the lab and I've been meaning to blog about them since then! The results were reviewed by me and our new Co-Administrator Lisa Little, and here is a brief summary of what they tell us.

The new member (MG-9273) had 11 matches (comparing his 37 marker results to everyone else in the database) and one of them was an ungrouped Gleeson who was already in our project (PG-7861). The Genetic Distance (GD) between them was 3/37 indicating 3 steps away from an exact match. And this would predict that they share a common ancestor some time within the last 14 generations (i.e. after 1530) and probably closer to 6 generations ago (about 1770). And as a result of this match, we were able to identify a new genetic family, which we have called Lineage VIII.

Judging by the other people these two project members match, this group appears to have come from North Tipperary. The names Meara, O’Mara, Leahy, and Carrell appear among their matches and all have strong North Tipperary connections. You can see where they sit in relation to each other on the “Tree of Mankind” here and in the diagram below.

The new genetic family (Lineage VIII) sits somewhere in this region of the Tree of Mankind

The names Kennedy & Carrell / Carroll (in the diagram above) are also strong North Tipperary names.

A SNP Progression is the series of SNPs that characterise each branching point in the Tree of Mankind from a relatively upstream ancient SNP marker down to where you sit on the Tree of Mankind. And the SNP Progressions associated with the O’Meara & Leahy branches are:
  • R-P312/S116 > Z290 > L21/S145 > DF13 > DF21/S192 > FGC3213 > Z16532 > Z16526 > Z16524 > Z16534 > Z16533 > Z16538 >Z16525 > Z16523 … 
  • then > BY61171 > Y142760 (Meara) … 
  • or > FGC14748 (Leahy)
Many of these SNP markers can be seen in the diagram above and also among the Y-DNA matches of the two members of the new Lineage. This all suggests that this particular Gleeson branch sits somewhere in the same area of the Tree of Mankind as the O’Meara’s and Leahy's.

The new group (Lineage VIII) is one of several North Tipperary groups of Gleeson’s (the others being the large Lineage II and the smaller Lineage VII). It could be that if we went back far enough along the father’s father’s father’s line of these two men that eventually we would see the name change to O’Meara or Leahy, but that could have been 400 years ago or more. The only way to be sure would be to do the Big-Y-700 test which would give very fine-scale detail about the position of this new Lineage on the Tree of Mankind.

But before that there are several things that can be done that would provide us with some useful additional information:
  1. Both project members could update the information on their MDKA (Most Distant Known Ancestor) - this will optimise the chances of making connections with other genetic cousins. See here for instructions ...
  2. Joining the relevant Geographic & Haplogroup DNA Projects could provide some additional interpretation from the Project Administrators, and its all free. To do so, simply click on JOIN in the photo on the following webpages:
    1. Ireland yDNA
    2. Munster Irish
    3. R1b & subclades
    4. R-L21
    5. R-DF21
    6. R-Z16526
  3. The members of the new group could contact each other and share information about their family trees. If they are lucky, they might even be able to identify where they connect.

Maurice Gleeson
April 2019

Uploading your New Big Y Results

The Big Y test changed to a completely new technology earlier this year. It now covers 50% more of the Y chromosome than previously. And so it is anticipated that the new test will discover additional SNP markers that the old technology did not detect. Furthermore, the new SNPs should be able to more accurately date the various branching points on the Tree of Mankind.

It also gives us approximately 700 STR markers whereas the previous test only gave approximately 500 STRs. As a result, the old test is called the Big Y-500 and the new one is called the Big Y-700. Going forward, all new Big Y orders will use this new technology.

For those who did the old test, it is possible to upgrade from the Big Y-500 to the Big Y-700. There are several people within the project who have done this upgrade and we will look at these results in a subsequent post.

But for everyone who does the new test, or upgrades from the old version to the new version, it is essential that you upload a copy of your results to the Big Tree so that we can get some essential additional analyses. You will find instructions for doing so on the Big Tree website here and on the Y-DNA Data Warehouse website here but I include a briefer summary below.

Creating a Link to your Big Y results

In order to create a downloadable link to your Big Y results, first log in to your FTDNA account and go to your Big Y Results page ...

Then click on the blue Download Raw Data button ...

Then you need to create a link to two separate files - your VCF file and your BAM file. The VCF file is used for placing you on The Big Tree. The BAM file is used for high-end technical analysis by the folks at the Y-DNA Data Warehouse. You can see some of the results so far on their Coverage Page here (and if you like you can search for kits by surname, including your own).

1) to create a link to your VCF file, right click on the green Download VCF button, and then click on "Copy link" from the drop-down menu. You will later paste this link into the the "Download URL" box on the Submission Form.
Alternatively you can simply (left) click on the green Download VCF button and this downloads a 10 MB file to your computer. This can then be directly uploaded via the Submission Form below. However it is preferable (and less problematic) to generate a link instead.
2) to create a link to your BAM file, click on the green Generate BAM button. You will then get a message that "Your Big Y BAM file is currently being generated" (see below). This generates a very large BAM file ... but it takes several days to prepare so you will have to come back to this page in a few days time! Put a reminder in your diary / calendar!

Uploading your VCF file

Having created the first link (to your VCF file) and copied it, click here to go to the Y-DNA Data Warehouse and fill in the form with your standard information - email, kit number, surname of your paternal MDKA (Most Distant Known Ancestor), and (most importantly) the link to your file - you do this by pasting the link you copied earlier into the "Download URL" box underneath the heading "Raw Data Upload" at the bottom of the page.

If you want to upload the actual file itself (rather than a link), click on the blue Direct tab under "Raw Data Upload" and then click on the "Choose File" button and attach the file from where you downloaded it onto your computer (on my laptop, the "Choose File" button appears to be slightly hidden under some text but it works if you click on the start of the text). 

Don't forget to tick the checkbox to confirm you agree with the Data Policy and then click the blue Submit button.

Uploading your BAM file

Several days later, come back to this same place to get a link to your newly generated BAM file. So, navigate to your Big Y Results page, and after clicking on the blue Download Raw Data button, you will find that the BAM file has been generated. DO NOT DOWNLOAD IT - you don't need to and it is way too big. Instead, click on the green Share BAM button and then the green Copy button in order to copy a link to your BAM file. You will share this link in the next step.

Then go to the Y-DNA Data Warehouse and fill in the same form as before BUT ...

  1. select Other for the Testing Lab
  2. enter your Kit ID Number 
  3. leave everything else on its default setting
  4. paste the link to the BAM file in the "Download URL" box underneath the heading "Raw Data Upload"
  5. tick the checkbox to confirm you agree with the Data Policy and then click the blue Submit button

What do you get from your Results?

Your results should be analysed within a week or two and you can check them by navigating to the particular portion of the Big Tree. Here you will see your placement on the Tree of Mankind and the surnames of the people sitting on neighbouring branches to your own. This information can be very useful for determining the geographic origins of your particular direct male line and for determining if your name is associated with an Ancient Irish Clan. Gleeson Lineage II members are surrounded by O'Carroll's (from nearby Offaly), McMahon's (from neighbouring Clare), McCarthy's (from North Cork), and Treacy's (from Galway). You can see these neighbouring branches in this portion of the Big Tree here.

Project Administrators can use programmes like the SAPP tool to generate Mutation History Trees and determine the likely branching structure of your particular "genetic family" from the time of surname origins up to the present day. This process can also help identify which Gleeson's are more closely related to you and which are more distantly related. It is also possible to date the branching points within the Mutation History Tree using SNP data as well as STR data. This process is likely to become more accurate with the advent of the new Big Y-700 data and the identification of new SNPs. It is anticipated that the new data will reduce the number of "years per SNP" from about 130 to about 80 years per SNP. You can read more about this here.

You can also click on your surname above your kit number for an analysis of your Unique / Private SNPs. These may prove useful in the future for defining new downstream branches in the Mutation History Tree and for dating new branching points. But this very much depends on new people joining the project and undertaking Big Y-700 testing (so that we can compare apples with apples). And as this is a new test, it is likely that we will have to wait some time before we begin to see real benefits from it.

Maurice Gleeson
Aug 2019

Thursday 24 January 2019

What's in a name?

I am delighted to introduce this guest blog post by Lisa Little, a member of the Lineage II Gleeson's of North Tipperary. Lisa has done some excellent research on her own particular direct male line which has taken her on an exciting adventure into the past, full of twists and turns. Lisa started out as a Little but ended up as a Gleason! And this is not an uncommon situation - many of us will find a surname or DNA switch (SDS or NPE) somewhere along our direct male line. Lisa used an ingenious approach (combining Y-DNA and autosomal DNA data) to elaborate this family mystery and discover where the surname switch occurred.

Thank you, Lisa, for sharing this wonderful story with us.
Maurice Gleeson
Jan 2019

What's in a name? that which we call a Gleason
By any other name would still be a Gleason . . .

The Story of Finding my Gleason Ancestry
By Lisa M Little

Benjamin J Little (1889-1989)

My maiden name is Little.  As a child, the name made me an easy target for a bully’s joke. Despite this, I have proudly kept this surname throughout my adult life.  Until 2006, I knew almost nothing of my Little ancestry.  My grandfather, Ben Little, was born in late 19thcentury San Francisco, California.  The documentary record of his early life was largely destroyed in the 1906 San Francisco earthquake and fire.  There was a vague family story of Ben’s father abandoning him as a toddler, never to be heard of again.  In the pre-home computer age of the 1970s, my hunt for my Little great grandfather, turned up only a single document (Ben Little’s Baptismal Certificate from Our Lady of Guadalupe Church on Broadway in San Francisco) bearing my great grandfather’s name:  Eugenio Little.

Figure 1   Author's documented paternal lineage 
at start of research 
(Eugenio is a Spanish variant of Eugene)

Thirty years later, in 2006, I was living in Georgia, USA, far from my family in California when I got the news that my father was in the hospital following a heart attack.  Being so far away, and feeling the need to connect, I returned to my search for my Little ancestry.  Genealogy had changed in the meantime, resources were now online and DNA analysis was opening doors to the past.  I decided to have my father do a Family Tree DNA (FTDNA) Y-25 test in hopes that we would find a match among those who had tested and were participating in the Little Surname DNA Project.  Much to my disappointment, the results identified my father as R1b haplogroup but did not match a single Little in the project.  Despite this disappointment, with the help of the project administrator, the late Leo Little, we hit the genealogical jackpot.  Leo was able to connect my Eugenio with his Little lineage by finding a reference to him in Descendants of George Little, who came to Newbury, Massachusetts, in 1640 by George Thomas Little1.  All of a sudden, I had generations of Littles to become acquainted with --- Littles who were early colonists in North America, Littles who fought in the Revolutionary War, even Littles who had participated in the Salem Witch Trials.  Following the paper trail to George Little now occupied my time and the DNA test slipped from my memory.

Figure 2   Cover of George Little genealogy 
which includes reference to Eugene M Little

Then, in November of 2013, I received an email from a FTDNA project administrator who, having reviewed my father’s Y-DNA results, suggested that he was likely to have the Z255 mutation, associated with the Irish Sea Haplotype.  By this time, I was teaching basic genetic structure and function to community college students in Southern California and thought testing for the mutation might prove useful as a teaching aid.  So, I ordered the Z255 SNP test from FTDNA. The test came back positive!  With these results in hand, I decided it was time to revisit the Little Surname DNA Project results page.  Surely, after so many years another descendant of George Little must have tested and joined the project.  Alas, still no match!  I asked myself how this could be the case.  The published George Little genealogy was over 600 pages long and included more than 6,400 descendants.  Surely, there was a living descendant apart from my father who had done a DNA test.

It was obviously time to put some real energy into figuring out what the DNA results were telling me.  Reviewing my father’s Y-25 DNA matches didn’t make any sense – not a single Little among all those genetic matches.  In fact, five of his 25 matches had the last name Gleason/Gleeson.  Only two other surnames had multiple matches:  Fennessy with two matches and Salisbury with three matches.  My head was spinning!  Obviously, I had more work to do to understand the results.  

A few months later, I was reading a post on the ISOGG website by Fannie Barnes Linder, entitled The Shock of Our Lives!  Ms. Linder told the story of receiving her brother’s DNA results only to find the top 16 matches all shared the same surname, not the surname of her brother, nor their father.  What she had discovered was a non-paternity event (NPE) in her paternal lineage.  According to ISOGG wiki, a non-paternity event is “any event which has caused a break in the link between an hereditary surname and the Y-chromosome in a son using a different surname from that of his biological father.”2  Suddenly, everything fell into place!  The lightbulb went off in my head!  No Little surname matches, but five Gleason/Gleeson matches!  My father was not a genetic Descendant of George Little.  Rather, he was the genetic descendant of some unknown Gleason.  No!  As a genetic genealogy novice, I didn’t trust my reading of the results.  So I turned to my friend, David Lyttle, who was at the time the DNA test consultant for Clan Little North America.  After reviewing my assembled data, he agreed I must be onto something.

Armed with this new NPE hypothesis, my research had three initial goals:  1) do more advanced testing, 2) reach out to close genetic matches in order to identify the North American Gleason lineage to which I belong and 3) explore the timing of the non-paternity event.

Still doubting the validity of my Gleason hypothesis, in the spring of 2014 I ordered a Y-67 DNA test.  Results were posted on June 6th:  of 23 matches, four were Gleason/Gleesons, with Genetic Distance (GD) values of 2 to 7.  The following day, June 7th, another Gleason match appeared, with GD 1.
TABLE 1:  FTDNA Y-67 Matches on June 7, 2014
# matches
Doty, Johnston, Myrick, Tripp
Anthony, Daley, Fennessy, Fitzpatrick, Gleason, Hogan-Wilbur, McCarthy, McCloughan, Myrick (2), Phelps (3), Whitmore (2), Wyght

have come to realize that Hestia, the Greek goddess of the hearth and family, is surely smiling upon me and guiding the search for my ancestry.  The two closest matches (or rather the administrators of the matches’ DNA kits) turned out to be genealogy experts who would become my teachers and partners in the search for our shared origins.  The closest match was Herbert L Gleason Jr (HLG/G55), the father and father-in-law of a couple who own Heirlines Family History & Genealogy in Salt Lake City, Mary Gleason Petty & James Petty .  The second closest match was father of none other than our Gleason/Gleeson DNA Project co-administrator and professional genetic genealogist, Dr. Maurice Gleeson. I again hit the genealogy jackpot! Thank you, Hestia!  On June 6th I sent my first email to Maurice and on the 8th received a message from Mary Gleason Petty.  In a matter of days, I discovered that my (genetic) paternal ancestors were Irish and that I fit somewhere into a North American Gleason lineage whose patriarch was James Gleason (1775-1805) of Dorchester, Massachusetts (See Figure 3).

Figure 3   Mary Gleason Petty's Paternal Lineage

Over the next year Y-111, Big Y, and Family Finder genetic tests were completed on my father.  With each new set of results, the evidence of my Gleason ancestry grew stronger (See Table 2).  And, yes, I became more and more addicted to genetic testing.

TABLE 2:  FTDNA Y-111 Matches on September 16, 2014
Surname (Kit #/Project ID)
Gleason (338070/G55/HLG)
Gleeson (N74958/G21)
Gleeson (334030/G54)
Gleason (N101540/G39)

Analysis of my father’s Y-111 STR mutations placed our paternal ancestors within Lineage II – Gleesons of North Tipperary, Ireland.  This evidence, coupled with Big Y SNP mutations, narrowed his position within Lineage II to Branch B (See Figure 4).  The details of this placement within the Gleason/Gleeson Mutation History Tree have been described in Maurice’s blog post dated 8 July 2018, A Closer Look at Branch B - New Y-DNA Results.

Figure 4   Detail of Lineage II Branch B Relationships. 
RTL = G57, HLG = G55

The Family Finder (FF) test, completed in July of 2015, identified Mary Gleason Petty’s father (HLG/G55) as my father’s (RTL/G57) 3rd to 5th cousin, with 40 shared centiMorgans (cM).  Assuming the two men were of the same generation, due to their similar ages, the FF results suggested that they shared a 2nd great grandfather to 4th great grandfather.  James Gleason of Dorchester, MA (1772-1805) would be their 3rd great grandfather. James had two sons and six grandsons. I simply didn’t have enough genetic data to narrow down where my paternal lineage fit into Mary’s family tree. The best I could do is try to find a geographic overlap between Mary’s Gleasons and my Littles.  One possible overlap became evident:  James Gleason (1772-1805) had a grandson, James Henry Gleason, who had settled in Monterey, California in 1846 and married a Californiobeauty.  While James Henry Gleason died 28 years before my grandfather was born, he did have four sons who might be worth a closer look.

I decided to focus some effort on the timing question:  When did the Gleason Y-DNA enter my Little lineage?    My first working hypothesis was:  My father (RTL/G57) was the first genetic Gleason and, thus, would not be a Y-chromosome match to his brothers.  (Forgive me Grandma Little for ever considering such a thing!)  As both of my paternal uncles passed away prior to my DNA discovery, I turned to one of their sons (G64 in Figure 4) for a Y-DNA test.  In May of 2015 my cousin’s Y-37 results were published:  a perfect match (GD = 0) to my father.  With my initial hypothesis rejected, I moved onto the prior generation. This is where I hit a brick wall. My grandfather, Ben Little, had no full male siblings.  There were half-brothers from his mother’s second marriage, but they would not share the same Y-DNA.  I moved back to Eugene Little’s generation in search of another living male descendant. Unfortunately, Eugene and his younger brother, Arthur Little, never had any male children that could be traced in the genealogical record.  My search for a living descendant of George Little continued generation by generation and took me from California to Maine, spanning 240 years.  Finally, a single living descendant was identified.  Then came time for that dreaded exchange, “Hello, I am a distant cousin.  Would you be willing to do a DNA test for me?”  This is a question that always leaves me feeling uneasy.  However, in this case, the answer was, ‘Yes’.  No hesitation.  In October of 2015, the FTNDA Y-37 results told us that my father and the Little 4th cousin are not a genetic match. These results pinpointed a 100-year period during which the Gleason Y-DNA could have been introduced into my paternal ancestry.  The problem remained that my exhaustive research of the Little family tree had failed to identify another living direct male descendant whom I could approach for DNA testing.  I began to create fanciful storylines with the genealogical evidence that was available:  Eugene Little’s grandmother got pregnant after an encounter with a Gleason man and was hurriedly married off to her first cousin. Ok, I’ll admit I have an over active imagination.  It’s just more fun to imagine a romance story than to admit I had hit the brick wall and saw few strategies to surmount it.

While in 2015 this Little “4th cousin” did not have a Y-DNA match to another male within the Little DNA Project, who claims descent from George Little of Newbury, MA, in January of 2019 a GD 1/37 match was made.  The two men share George Little’s oldest son as their most recent common ancestor, making them 8th cousins.  Taken together, the Y-DNA results for the two men have now established a genetic profile for their Little lineage to which future descendants of George Little can be matched.

Figure 5   Possible locations of the Surname / DNA Switch (SDS a.k.a. NPE)
The switch happened somewhere on the Little Direct Male Line ... but where?
FF suggests a connection to HLG / G55 via common 2x to 4x great grandparents (red bracket on left)
Y-DNA of Little cousins rules out switch prior to Benjamin Little 1802-1907 & after Benjamin Little 1889-1989
Green indicates Gleason Lineage II Y-DNA, Orange indicates Little Y-DNA
(click to enlarge)

Faced with a seeming insurmountable challenge, during the Spring of 2016 I was excited to hear that Maurice Gleeson was visiting Southern California.  At last, after two years of email exchanges with my mentor, I had the opportunity to meet him in person and discuss future research strategies.  During a delightful lunch, that included Mary Gleason Petty’s sister, Martha (a mini Gleason family reunion), Maurice suggested that I do the autosomal DNA test on my father.  Perhaps, wading into another database would prove helpful.  

Eureka! In May of 2016 I opened my father’s newly published DNA results on my computer.  As I scrolled down the list of matches, my reaction was, “You have to be kidding me!”  Many usernames were simple initials.  It was going to take some real time investment to find Gleason matches with only last initials to go by.  My initial panic was unwarranted as among the 3rd cousin matches was an individual with a ‘G’ surname initial and a kit manager with the surname Gleason. Correspondence with the kit manager confirmed that the individual was a descendant of James Henry Gleason (JHG) of Monterey, California.  This match and my father shared 118 centiMorgans over three segments, a considerably closer autosomal match than to Mary’s father of 40 cM.  Could I hope that one of the four sons of JHG could be my genetic great grandfather?

This is where I needed to bring the genealogical record together with the genetic data.  Eugene Little was born in 1853 in Maine.  JHG was born in Plymouth, MA in 1823, making him old enough to have been Eugene’s biological father.  However, JHG’s youth has been well documented and published in a book entitled Beloved Sister:  Letters by James Henry Gleason from California and the Sandwich Islands, 1841-1859.JHG was a continent away when Eugene was conceived and born.  JHG’s four sons were born between 1850 and 1860, making them 39 to 29 at the time my grandfather, Ben Little, was born.  The timing is right for one of James’ sons to be my genetic great grandfather but what other evidence could be brought to bear?  

If one of these four Gleason men was my grandfather’s biological parent, I should also find genetic matches to descendants of their mother’s family.  I mentioned above that she, Mariana Catarina Demetria Watson, was a Californio beauty.  Many early European settlers in California married into prosperous Mexican families prior to US control of the territory.  JHG’s father-in-law, James E Watson, was among this group.  James Watson, an Englishman of Scottish origin, arrived in California in the 1820s and set up a hide and tallow business.  He married Marianna Escamilla in 1830 and in 1850, he purchased the Rancho San Benito in the Salinas Valley.  I began to search my father’s DNA matches for descendants of the Watson and Escamilla families.  

This is where DNA Circles can be a useful aid in exploring hypothesized relationships.  After updating my linked family tree to include JHG and his wife, as well as her Watson/Escamilla parents, a DNA Circle was generated that included several individuals who trace their ancestry to Marianna Escamilla.  Today, that DNA Circle includes 21 members.

Figure 6   Ancestry's DNA Circle
for my (genetic) great great grandmother

Being a genetic match to 20+ descendants of Marianna Escamilla strongly supports the idea that one of her grandsons was the biological father of Ben Little.  The DNA evidence had revealed a NPE and led me to my genetic Gleason family.  The task that lies ahead is figuring out which of the four sons of James Henry Gleason met another Mexican beauty, Ben’s mother Librada Solano, and conceived my grandfather. If I could locate a living grandchild of each of the four brothers, three of them would be 2nd cousins to my father, and one of them would be a half-1st cousin. Autosomal DNA testing of each should distinguish between them: 2nd cousins share on average 220 cM, while half-1st cousins have an average 440 shared centiMorgans.

That part of my journey is for a future blog post.

Figure 7   The current draft of my (genetic) family tree on my father's side

1 Little, G. Thomas. (1882). Descendants of George Little: who came to Newbury, Massachusetts, in 1640. Auburn, ME: George Thomas Little.
International Society of Genetic Genealogy Wiki:  Non-paternity event.  Page last modified 17 July 2018.  Page accessed  21 January 2019.
Californio (plural Californios) = A Spanish-speaking resident of the now US state of California during the period of Spanish and Mexican rule, roughly from the late 17th to mid-19th centuries.
Gleason, James Henry (1978) Beloved Sister: The letters of James Henry Gleason, 1841-1859, from Alta California and the Sandwich Islands, with a brief account of his voyage in 1841 via Cape Horn to Oahu and California. Glendale, CA:  A. H. Clark Co.

Lisa M Little
Jan 2019