Thursday 3 December 2015

Mutation History Tree for Lineage II

What are Mutation History Trees?

These trees are like family history trees but the "family" part has been replaced with "mutations", or rather, has been augmented with mutations. In other words, when the known ancestors run out, DNA mutations still allow us to keep on going back in time and thus identify the branching pattern within the overall group for each Lineage. It also allows us to estimate when the different branching points between group members occurred. This helps you see where you sit in the overall group and who is most closely related to you. And this in turn helps foster collaboration between project members which will hopefully lead to breakthroughs in your genealogical research.

Copies of Older Versions will be archived so that we can keep track of the evolution of the trees over time as more people join the project and upgrade their tests.

Lineage II (North Tipperary group)

Below is the Mutation History Tree for Lineage II (version 1). You can view it or download it as a high-quality pdf document via this Dropbox link. The tree will be constantly updated as more project members join. You can click on the image to enlarge it and then right click again and "Open in a New Window" (Mac) to make it even larger. Alternatively, you may find it easier to download the image (simply right click on the image and choose download). Or a pdf version of the tree is available in our Facebook group here - this has the added advantage that it is searchable.

But first, you will probably need some explanation of what you are looking at.
  • Starting at the top, it follows the mutations in STR markers from subclade Z255 (characterised by the Z255 mutation which formed about 4300 years ago [1]) down in time through various subclades (Z16437/Z16439 and Z16438, both formed about 1500 years ago) to the Gleeson MRCA (Most Recent Common Ancestor of Lineage II) and then down to the modern day Gleeson males who have tested in Lineage II.
  • The tree is based on both SNP and STR data. SNPs are cut & pasted from Alex Williamson's Big Tree. STRs are extracted from our project's DNA Results page
  • Fluxes software was used to create an initial STR-based branching pattern. SNPs were added and helped to anchor the upper reaches of the tree.
  • Branching points have time estimates in generations back from the present. Allowing 30 years per generation gives a good indication of the number of years back to a particular branching point from the testers date of birth (usually between 1930-1950).
  • STR mutations are written as: marker number, old value - new value
  • Back Mutations are highlighted in yellow. Parallel Mutations are written in red text.
  • Branches have been numbered (in bold royal blue).
  • The number of Markers Tested by each participant is indicated as 25, 37, 67 or 111. BY stands for the Big Y test.
  • FTDNA Kit numbers, Project G-numbers, and initials represent each of the participants.
  • Country of present residence indicates the extent of the Gleeson diaspora.
  • Gleeson Ancestral Lines are indicated for each participant by the green boxes with a crude timeline to the side.
  • MDKA Profiles list possible MPRs (Markers of Potential Relatedness) which may be particularly useful information for collaboration between project members (especially birth location, family nickname or agnomen, & occasional other major distinguishing features).
Lineage II Mutation History Tree Nov 2015 (version 1)
Download a pdf version here

You can watch a video of the Journey of Discovery I took in creating this tree below.

What does it tell us? 

The current starting point for Lineage II is estimated to be more than 25 generations ago. That's about  750 years before the average tester's date of birth. If we assume that to be 1950, and allowing 30 years per generation, then we are looking at an origin for Lineage II about 1200 AD. This estimate may change as more people join the project.

The Glisson branch (G-68, 411177) is an ancient branch, going back about 21 generations ago = 630 yrs = 1300 AD.

The most closely related branches are 8, 5, 4, & 12. These members are actively exploring their connection using traditional genealogical records.

As regards the process of building this MHT ...
  • The 37 marker test (Y-DNA-37) is sufficient to allocate new participants to Lineage II, but is not adequate for giving accurate estimates of TMRCAs  (and hence branching points)
  • 67 or 111 STR markers are needed for more accurate dating of the branching points
  • STRs better define the branching pattern than SNPs. Thus far, SNP markers appear to be more useful for defining the more upstream branches of the tree and STR markers the more downstream branches. This may change over time.
  • In order to improve the accuracy of this MHT, we need a) more people to test; b) more people to upgrade to 111 markers; and c) more people to SNP test
  • In due course, it is hoped that a Lineage II-Specific SNP Marker Panel will be developed (for about $120). This will hopefully save new members money on SNP testing.
A series of blog posts will follow in due course exploring the various steps involved in this creation process in greater depth.

Maurice Gleeson
Dec 2015
[1] estimates carry a wide margin of error. Current estimates are found here - 

Tuesday 1 December 2015

Making Yourself Visible

Some people have been surprised to find that they don't appear on the DNA Results pages even when they have joined the project. Why is this?? The explanation is quite simple.

FamilyTreeDNA changed their privacy settings during the past year and the result has been that customers have to set the Privacy settings themselves. However, not everyone knows about this so some people are inadvertently hiding themselves from view when they don't mean to do so.

Here is an example of the DNA Results page for Lineage II a) when I am signed out of my FTDNA account and b) when I am signed in. 

In the first view, 5 people are missing. These are indicated by the red arrows in the second view. These participants are probably unaware that their Privacy settings are such that they cannot be seen in the project by anyone from outside the project (and they can only see themselves if they are signed in to their account).

The problem with this is that it may inadvertently stop new potential recruits from testing. Would you be less inclined to sign up to a project if there were only 11 members instead of 16? I think so. Bigger projects look more impressive. There is more of a chance that you might find a match. Also, supposing the new recruit was called Glisson, they would not see that there is already a Glisson in the project. Or if their ancestor came from Boherlahan - they would not see that there was already someone there with ancestry from exactly the same area. In both these situations, the prospective tester might have been encouraged to take the test if he had seen this information.

The good news is that it is easy to fix. Just follow these simple steps:

  • Sign in to your FTDNA account
  • Hover over your Name in the top right
  • Click on Account Settings, then the Privacy & Sharing tab at the end of the menu bar above
  • Then simply change the settings under My DNA Results by clicking on the words "Project Members" at the end, and on the next screen checking the box beside "Make my mtDNA & Y-DNA data public". Then press Save.

Before the change
After the change

For comparison, this is what my own Privacy Settings look like:

If you need any help with this, please don't hesitate to email me and I can do it for you in 2 minutes.

Here's hoping someone will see your details on the DNA Results page and will decide to join the project simply because they like something they see there.

Every little helps.

Maurice Gleeson
Dec 2015

Monday 30 November 2015

New Pages make life easier

Now that the this project website has been up and running for almost a year, it is clear that some posts are visited very frequently, and to help people navigate the website a little easier, I have created a new set of Pages to the right which will help newcomers easily navigate the website and find the information they require.

The new pages include a revised version of the original Welcome blog (Feb 2015) as well as separate pages for people who want to join the project and who want to upgrade their test. There is also a new page which has lots of useful tips on how to get the most out of your DNA Test and everyone should check this out, even if you have been a member of the project for a long time. There may be some useful hints and tips there that you may not be aware of.

Here is the list of new pages. Please feel free to copy and paste this pages list into any email you might send to prospective DNA testers.

Maurice Gleeson
November 2015

Tuesday 17 November 2015

Exciting times at the 11th Annual FTDNA Conference

I have just returned from the 11th Annual FamilyTreeDNA Conference in Houston (Nov 11-13) where I had the pleasure of meeting my wonderful co-administrator and founder of our DNA Project, Judy Claassen. Even though we have talked previously on Skype and are in constant email communication, it was great to meet Judy in person for the first time and we had a fabulous weekend discussing various aspects of the project. We also networked with some of the best brains within the genetic genealogy community which has helped advance our ideas about next steps for the project.

Maurice Gleeson & Judy Claassen

I also had the privilege of presenting and sharing some of the results of our Gleason/Gleeson DNA project with the audience (made up entirely of FTDNA Project Administrators like Judy & myself). The title of my talk was "Combining SNPs, STRs, & Genealogy to build a Surname Origins Tree" and it will be available on my YouTube Channel here. In this talk, I developed some of the themes that I had touched on in an earlier presentation at Genetic Genealogy Ireland 2015 (Oct 9-11), which is also available on YouTube

Networking with David Dunbar, James Irvine & Cynthia Wells

In brief, I described the fascinating journey of discovery I have taken these last few months to arrive at the Mutation History Tree for Lineage II of the DNA Project. I have previously posted about the generation of this tree based on 12 & 37 STR marker data but I have held off on publishing the more advanced tree (combining 111 STR marker data and SNP data) pending the arrival of additional DNA results. Well, those results are now in, I have incorporated them into both of my recent presentations, and I will be posting a summary of these results shortly in a subsequent blog post. 

The resulting Lineage II tree uses the DNA mutations present among project members to define the branching pattern of the tree and to estimate when each of those branching points occurred. Every member can see where they sit on the tree in relation to every other member and will also have an idea of how closely or distantly they are related to everyone else within the group. And of course as more people join, they will be added to the tree, it will get bigger, and the branching pattern and time estimates will become more accurate.

An early version of the Mutation History Tree for Lineage II
with time estimates for each branching point

After the conference, a few of us stayed an extra day to tour the FTDNA labs. They really have quite a set-up and the resources they have available are impressive. We can expect some great things from FTDNA in 2016,

Bennett Greenspan, CEO, demonstrating a machine in the lab

Bennett Greenspan & Max Blankfield
 have a quick chat

I also was also honoured and privileged to receive an award (organised by Brad Larkin of the SurnameDNA Journal) in recognition of my work in genetic genealogy and was voted Genetic Genealogist of the Year 2015, a very humbling experience given that there are so many of my peers equally deserving of such an accolade. The award itself is made of fine wood and is beautifully inscribed with gold lettering. It is going up on the wall of my study as soon as I can lay my hands on a hammer and a nail.

Maurice Gleeson & Brad Larkin

A big thank you to all of the people who have helped me over the years, especially my colleagues within the genetic genealogy community whose conversations and discussions have served to formulate my own ideas about genetic genealogy. I am humbled by the award and honoured to be part of this wonderful community.

Maurice Gleeson
17 Nov 2015

Tuesday 10 November 2015

MDKA Profile – John Gleeson 1832-1885 (G-21, N74958)


Name of MDKA:                    John Gleeson
Family nickname/agnomen:   ?McEvaddy or?Rabbit
Date & location of birth:         ?1832, Lackagh
Date & location of death:        1885, Shallee
Residence(s):                           Shallee (probably Shallee Upper & Lower, near to Longstone & Killoscully)
Occupation:                             Labourer
Other family occupations:       Teacher, Miner
Religion:                                  Catholic
Wife's name:                            Anne Gleeson
Date & location of birth:         1848, Shallee
Date & location of death:        1926, Ballyhourigan
When & where married:          1865, Ballynahinch
Sponsors at wedding:              not recorded
Order of children's birth:         Martin, Catherine (?Ruby Kathleen), Bridget, Timothy, Winifred, 
Order of children's birth:         Hanora, Patrick, Mary           
Possible father & mother:*     Martin & Bridget
Sponsors at baptisms:            
1) Pat Gleeson, Wenefrid Finey; 2) Patrick Gleeson, Anne Gleeson; 3) Michael Greene, Mary Gleeson; 4) ??; 5) ?John Gleeson; 6) Thomas McDonnell, Mgt Bevan; 7) ??; 8) John Ryan, Catherine Gleeson

DNA kit no:                            N74958
Project ID no:                         G21
DNA Tests taken:                   Y-DNA-111, FF, FMS, Big Y, Z255 Panel
Terminal Y-SNP sequence:     Z255 > Z16439/7 > BY2852 > A5629 > A5628 > Y16880
Closest match(es) ID no:        G64 & G57   
Place on the Lineage II tree:   Branch 6

Link(s) to online tree(s):
* based on naming convention

Supporting evidence & documentation

Name of MDKA: John Gleeson
  • See birth & baptism records of his children below. [1] They record the parents as John Gleeson & Anne Gleeson
Gleeson nickname/agnomen: ?McEvaddy or?Rabbit
  • Dad’s 1st cousin Liz thinks that the family nickname was MacEvaddy. This is a name from Northern Ireland. Apparently miners came to the area from Northern Ireland so maybe one of them was a McEvaddy. However, I have never encountered any such name in the records for the area.
  • When Dad and I visited the ancestral home of Anne Gleeson (back in 2012), the present owner of the house said that she thought she remembered seeing the nickname “Rabbit” on some of the deeds or other documentation relating to the purchase of her house. See blog post "What happened to the widow Anne Gleeson?
Date & location of birth: ?1832, Lackagh
  • Based on the fact that his parents may have been named Martin & Bridget, I found a possible baptism record in Lackagh, a neighbouring townland to Shallee. However, this may not be the correct one [2]
Date & location of death: 1885, Shallee
  • Death certificate found for a John Gleeson who died in Shallee aged about 50. The informant was ??? [3]
Residence(s): Shallee
  • It is not certain where in Shallee the family lived. Several children were born in Shallee, others in Longstone, and one in Killoscully, so it is assumed that the family lived in the townland of Shallee Upper & Lower (rather than in the neighbouring townlands of Shallee Coghlan or Shallee White). It has not been possible to find the family in Griffith’s Valuation or in any of the subsequent Cancelled Books, so John may not have been the head of the household, but rather the son or son-in-law of the head of the household. 
  • See blog post "What happened to the widow Anne Gleeson?
Occupation: Labourer
  • See birth & baptism records. [1]
Other family occupations: Teacher, Miner
  • His eldest son Martin was a Teacher in Co. Wicklow (Stratford-on-Slaney, & Baltinglass). 
  • His son Paddy was a Miner in Butte, Montana. He may have worked in the local mines in Sivermines.
Religion: Catholic
  • See birth & baptism records. [1]
Wife's name: Anne Gleeson
  • See birth & baptism records. [1] Also church marriage records from 1865 & 1895 - she remarried after his death to a Patrick Ryan (soldier) and lived in Maunsell’s Cottage in Ballyhourigan. [4,5]
Date & location of birth: 1848, Shallee
  • From Baptism record. [6] Her parents were Timothy Gleeson & Catherine Ryan, consistent with the naming convention predictions (see below) based on her children’s names (lending support to the notion that John’s parents were Martin & Bridget). Also, supportive data for her parents’ names and her siblings was obtained from US records & letters from her daughter (written from 1938-1942).
Date & location of death: 1926, Ballyhourigan
  • From death certificate. [7] No headstone found.
When & where married: 1865, Ballynahinch
  • From church marriage record. [4] No civil marriage record was ever found for either her 1865 or 1895 marriages.
Sponsors at wedding: not recorded
  • See church marriage record.
Order of children's birth: Martin, Catherine, Bridget, Timothy, Winifred, Hanora, Patrick, Mary
  • See birth & baptism records. [1]
Possible father & mother:* Martin & Bridget
  • See 1st son and 2nd daughter. His wife Anne’s parents are predicted to be Timothy & Catherine (2nd son & 1st daughter’s names) and subsequent records confirmed this.
Sponsors at baptisms:
  • See birth & baptism records of his children. [1]

Sources & References

[1] birth & baptism records of his children

No birth records found for Patrick but he appears in the 1901 & 1911 census.

[2] possible baptism record for John Gleeson

[3] death certificate for John Gleeson

[4] church marriage record of Anne Gleeson to John Gleeson (1865)

[5] church marriage record of Anne Gleeson to Paddy Ryan Soldier (1895)

[6] baptism record for Anne Gleeson 1848

[7] civil death certificate for Anne Gleeson 1926 (she remarried to Patrick Ryan Soldier)

Maurice Gleeson
Nov 2015

Thursday 13 August 2015

Building a Mutation History Tree with STR data

In the previous blog post we explored Genetic Distance and how it can be used to group people together into the same Genetic Family (or Lineage). We also saw how a lot of your relevant non-matches only show up in surname projects because they are outside the FTDNA matching threshold. Lastly, we introduced the concept of the Mutation History Tree (MHT) and how traditional genealogies could be hung on its branches to give a combined tree that uses mutations when named individuals run out.

In this blog post, we will look at how to build a Mutation History Tree using STR data.

Building a simple Mutation History Tree

A huge thank you is due to project member Lisa Little who generated a lot of the data discussed below, and to Nigel McCarthy (Admin of the Munster Irish Project) who has offered invaluable advice.

It is clear from the Results Page of the Gleason DNA Project that each Lineage has a distinctive coloured pattern by virtue of the values for its STR markers. This distinctive coloured pattern reflects the unique marker values for each Lineage, and distinguishes one Lineage from another. The Modal Haplotype for each Lineage represents the "signature tune" for that particular Lineage.

The distinctive coloured patterns of Lineages I, II & III
(from the Results Page on the World Families.Net site)

Furthermore, within each Lineage, there are subtle differences among members. In other words, most people are not an exact match to the Modal Haplotype - everyone in the group sings the "signature tune" just ever so slightly differently. They're all in the same choir, but some hit a bum note! These differences between project members in their STR marker values allow us to construct a diagram that suggests how the various project members might be related to each other.

So, for example, if we take the first 12 markers in Lineage II we could construct a Mutation History Tree (MHT) that separates out the members of the group into different branches. Reading from left to right across the tree in the diagram below, the first 4 members (G22, G05, G57, and G64 in the diagram below) all match the Modal Haplotype at 12 markers (Branch 1). The fifth member (G42) has a single mutation (in red) at marker 385b which has mutated from the modal value of 14 to a new value of 15 (Branch 2). This may have happened in the previous generation or many many generations ago. The 6th member (G21) apparently has 2 mutations* from the modal (on markers 389i1 and 389i2), and the remaining members all have a mutation at marker 390 with 3 of them having an additional mutation on marker 392 (Branch 5).

12-marker MHT for Lineage II
click to enlarge

This diagram gives a pictorial representation of how the different members of Lineage II may be related. The branching pattern derived from the DNA mutations may very well correspond to the branching pattern that one might see in the traditional Family History Tree if we were able to trace it all the way back with documentary evidence to the MRCA (Most Recent Common Ancestor). Thus the Mutation History Tree can give us important clues regarding which individuals are likely to be on the same branch of the overall tree, and who is more closely related to whom. This in turn can help focus further documentary research. 

In the example above, the project members in the last branch on the right (Branch 5) are more closely related to each other than to anyone else in the project - they should get together and try to figure out how they are related. Their MRCA is a lot closer in time than the MRCA they share with (for example) the first group in the tree (on the far left, Branch 1). Similarly, because Branch 5 is in fact an off-shoot of Branch 4, the MRCA for these two groups is going to be closer in time than the MRCA either shares with any of the other groups. Thus, for example, Branch 5 members may share an MRCA born in 1750, Branches 4 & 5 share an MRCA born in 1610, Branches 4 & 3 share an MRCA born in 1390, and the MRCA for the entire group was born in 1125.

But this is the pattern based on the values of just 12 STR markers. What happens when we use 37 markers? or 67? or 111? The more markers that are used to generate the Mutation History Tree, the more accurate the picture is likely to become and the more likely we are to approximate the branching pattern in the actual Family History Tree.

However, not everyone in the group has tested to the same level of STR markers. Six people have tested to 111 markers, 3 people have tested to 67 markers, and 2 people have tested to 37 markers. And of the 4 people in "Possible Lineage II", one has tested to 25 markers and 3 have tested to only 12 markers. This obviously creates difficulties in accurately allocating people to the correct branch of the Mutation History Tree and such allocations are likely to change as these members upgrade their results and more data become available.

These are very important points that need to be born in mind and are worth repeating:
  • Mutation History Trees are only as good as the data available 
  • They are liable to change as more data becomes available
  • The more data used to generate the tree, the more likely it is to approximate reality

Building a more complex Mutation History Tree

To generate a more advanced Mutation History Tree, using up to 111 markers, we can use programmes such as Fluxus to generate a phylogenetic tree / cladogram that produces a "best fit" model of the tree (i.e. with the fewest number of branches possible for the known mutations). This is often called a "maximum parsimony" approach and the principle is akin to that of Occam's razor which simply states that - all else being equal - the simplest hypothesis that explains the data should be the one that is selected. It may not be the correct one (there are usually other possible alternatives with varying degrees of plausibility), but it has the highest probability of being the correct one.

Using the Fluxus software is not easy - the process is multi-step and complicated, it is not user-friendly, and it takes time. Nevertheless, we will look at the output of this software in a separate blog post.

Below is a summary of the various STR mutations that occur within Lineage II (courtesy of Lisa Little). Mutations from the Modal Haplotype are highlighted in yellow and beige. The markers are divided into their various "Panels" by bold dark lines. Mutations among the first 12 markers are in the first 5 rows (Panel 1); the next 3 rows are mutations among markers 13-25 (Panel 2); and the following 4 rows have the mutations among markers 26 to 37 (Panel 3).

From these it is possible to build a more evolved Mutation History Tree than the one generated using only 12 marker data. As all 11 members have tested to 37 markers, let's construct a tree based on 37 markers and compare it to the one generated from the 12-marker data.

Lineage II's STR mutations (from Lisa Little)
37-marker Mutation History Tree
click to enlarge
The previous 12-marker Mutation History Tree (for comparison)
click to enlarge

In the new 37-maker Tree, additional branches and additional mutations are indicated in pink. The branches have doubled and are now numbered 1 through 10. Branch 3 is an exact match to the Modal Haplotype (but no members sit on this particular branch). There are several important points to note when comparing this new 37-marker Tree to the previous 12-marker Tree:
  • The tree with more data (the 37-marker Tree) has more branches, more definition, more granularity, more fine detail
  • The new branches allow us to revise our estimates for when the MRCA for the various branches may have been born. For example, what was Branch 5 has now split into two branches  (9 & 10), and the members in Branch 9 (G39, G51) share an MRCA who was possibly born several generations after the MRCA for Branch 9 & 10.
  • There are several Parallel Mutations in the 37-marker Tree (i.e. identical mutations that occur in several different branches)
    • 464b (16>17) occurs on 2 branches (Branches 4 & 8)
    • 464c (17>16) occurs on 2 branches (Branches 1 & 10)
    • CDYa (39>38) occurs on 5 branches (Branches 2, 4, 6, 7/8, & 10)
    • CDYb (40>39) occurs on 2 branches (Branches 1 & 9)
    • 456 (16>15) occurs on 3 branches (Branches 2, 6 & 7)
  • The apparently large number of Parallel Mutations may be because this is not a "maximum parsimony" tree, and there may be another way of arranging the data that would produce a better "fit" with fewer branches. The Fluxus software could help clarify this. 
  • Alternatively, this may be a very accurate reflection of the real Family History Tree (i.e. how people are actually related ancestrally) and the large number of Parallel Mutations are due to the rapid mutation rate of the STR markers in question. This is entirely plausible as several markers are known to mutate back and forth fairly frequently from one generation to the next (e.g. the CDYa and CDYb markers). FTDNA identifies such rapidly mutating markers by colouring them in dark red on the DNA Results page.
  • Back Mutations may be present in the tree, but are hidden  ... and thus the tree may be wrong
    • for illustrative purposes, in Branch 2, theoretically there may have been a mutation in one of the members ancestral lines such as CDYb (40>39) in (say) 1470, and a few generations after that a Back Mutation CDYb (39>40) in (say) 1610. This would place these members on 2 different branches rather than on the same branch where they currently sit. 
    • And even though in reality, they share a Common Ancestor in 1470, the Back Mutation masks this, and makes them look much more closely related than they actually are, with a Common Ancestor that appears to be sometime in the 1700's perhaps, 300 years later than it actually is!
    • This latter point is a common experience when working with any type of DNA - the Common Ancestor is further back than he/she looks.
Fast-mutating markers (in dark red) among the first 37 markers of FTDNA's Y-DNA test

Examples of Mutation History Trees generated using 37-marker results can be found in the Allen Patrilineage Project for their Patrilineage I and Patrilineage II.

When building a Mutation History Tree with larger numbers of markers (67 or 111), software programmes such as Fluxus become indispensable because doing it by hand is much more difficult. And as stated previously, the tree is likely to change as more data is used to generate the branching pattern. If we generated a tree based on 67 marker data it would become even more detailed, and more so too with a 111-marker based tree. Furthermore, adding more data is likely to change the branching pattern within the tree as a new "best fit" model is identified. 

And this essential point will be perfectly illustrated when we add SNP marker data into the mix - it throws our 37-marker "best fit" Mutation History Tree into a completely new configuration.

I don't know about you but I can hardly contain myself.

Maurice Gleeson
13 Aug 2015

* the 2 apparent mutations are in fact only 1. The marker DYS389 is a single STR marker that has four parts: m, n, p, and q. At FamilyTreeDNA they have two tests for DYS389. The first test looks at the first two parts of marker DYS389 (m and n). This is what they call DYS389I. The second test looks at all four parts of DYS389 (m, n, p, and q). This is what they call DYS389II. There are, by scientific convention, two ways to display the result of DYS389II.  The way FTDNA display the result is by showing the total for the entire DYS389 marker (m+n+p+q). This is described in their Learning Centre FAQ here. Member G-21 has a mutation in 389i - which is indicated by his value of 13 rather than the modal 14. For marker 389ii his value of 29 rather than 30 indicates that in the entire marker (parts m+n+p+q) he still only has a single mutation - that same single mutation which is reported as 389i. In reality, since 389ii includes all four parts of the marker, we should just drop 389i off our table. Any mutation in 389i will be included in 389ii. My thanks to Lisa Little for pointing this out.  

Friday 7 August 2015

Genetic Distance, Genetic Families, & Mutation History Trees

In this blog post, we examine how people are grouped together into Lineages (sometimes called Genetic Families) and how the relationship between people within a Lineage can be mapped out and represented in a Mutation History Tree.

Grouping People Together

The members of each lineage within the project have been grouped together because their genetic signatures (aka haplotypes) are similar, suggesting a common ancestor some time in the past several hundred years. The degree of similarity between any two individuals can be assessed by the Genetic Distance between them, as discussed in a previous blog post and reproduced again below:
Who qualifies as a match to you? Anyone whose marker values are sufficiently similar that they meet the criteria set by FTDNA to be declared "a match". And here are those criteria:
  • a GD of 2 at 25 markers
The ISOGG Wiki has a very nice summary of Genetic Distance and the criteria for matching.
However, Genetic Distance is not the only possible criteria for grouping people into the same "Genetic Family" or "Lineage". Other considerations include traditional genealogical indicators such as having the same surname (the main criterion for surname studies), or having an ancestor who came from the same location as the ancestors of other group members. Additional genetic criteria may include having the same rare marker values, or having the same terminal SNP. These considerations can also serve as indicators that Lineage members have been grouped correctly i.e. members may be grouped together on the basis of one criterion (e.g. Genetic Distance) and subsequently are found to share a second criterion (e.g. the same rare marker value, ancestors from the same location, or even the same MDKA). You can read more about some of the possible criteria for grouping people together into the same genetic family here.

Why do I match some Project Members but not others?

You will probably notice that not everyone in your Lineage turns up in your list of matches on your Matches Page. The reason for this is that they do not meet the FTDNA criteria for a "match" to you, but they do meet the criteria as a match either to the Modal Haplotype* for the Lineage or to other members of the project. This is one of the key benefits of joining a surname project (such as the Gleason/Gleeson DNA Project) - it can connect you to people within the FTDNA database who do not show up in your list of matches but to whom you are still likely to be related.

click to enlarge

For example, in Lineage II, my Dad (G21; N74958; yellow dot in the diagram above) is a match (at 37 markers) to only 5 of the other 10 members: G22, G57, G64, G55, and G66 (green dots). He does not match the members with red dots. In other words, his genetic distance to the green dot matches is 4/37 or less, whereas his GD to the red dot matches is 5/37 or greater (in fact, reading down, his GD to each of the red dot members is 5/37, 6/37, 5/37, 6/37, & 6/37 respectively).

But let's look at the member closest to the Modal Haplotype* for Lineage II. This is the person with the fewest number of mutations compared to the Modal Haplotype, or (in other words) the smallest Genetic Distance from the Modal Haplotype. There are no exact matches (i.e. 0/37) to the Modal Haplotype in Lineage II, but there are two members who are the closest (GD = 1/37) and these are members G57 and G64 (kits 60393 & 365763). They are uncle/nephew, both carry the surname Little, and they are believed to have a genetic Gleason ancestor somewhere in the last several hundred years. As Project Admin, I have access to their pages and I can look at their matches. This tells me that each of them matches 7 of the 10 other members in Lineage II. I can also see that their Genetic Distance to the remaining 3 members of Lineage II is only 5/37, just outside the FTDNA threshold for declaring them a match. Looking at these 3 latter members, one matches two other people within the project and the other two match 5 project members.

So everyone within the project is a match to at least one other person within the project, but the distance between any two project members can vary considerably - some are very close, others are very distant.

And this creates a wonderful possibility - by analysing the degree of Genetic Distance between members we can construct a diagram of the branching pattern within the group which shows how closely or how distantly each member may be related to each other member. Such a diagram is variously called a phylogenetic tree, a phylogram, a cladogram, or a Mutation History Tree. The same process is used to generate evolutionary diagrams showing where all living creatures sit on the Tree of Life.

Interactive version of the Tree of life 

Mutation History Trees

This concept has important potential implications for genealogy. Theoretically, it should be possible to take the known genealogies of each member within a Lineage and hang them onto the appropriate branch within the Mutation History Tree. In this way we will have a combined tree that starts off in modern times with named individuals, goes back along each ancestral line until the named individuals run out (at each branch's Brick Wall or MDKA, Most Distant Known Ancestor), and then the tree continues back in time using genetic marker mutations instead of people, culminating at the Modal Haplotype for that particular Lineage.

A combined Family History Tree & Mutation History Tree

In the combined tree above, named individuals appear in the blue boxes, starting with living individuals (born about 1960) and going back in time to the MDKA for each branch. Most branches have an MDKA born about 1810 to 1840 (which is typical for Irish research), and some have a Brick Wall at a later point (Branches 5, 6, & 7 have an MDKA about 1870). One lucky branch can trace their line back to 1690 which is highly unusual but something we all hope for. You never know when a new member will join your particular Lineage who happens to be in possession of the Family Bible! And because you are genetically related to him, his Family Bible pertains to your family too. In this way, all members within a particular Genetic Family can "piggyback" onto the pedigree of the member with the longest pedigree.

Where the named individuals end, DNA marker mutations take over. STR markers are in yellow, SNP markers are in pink, but this is just a crude representation of what the tree might look like. In reality it would be much more complicated than this.

The person sitting at the intersecting point of all the branches is the MRCA (Most Recent Common Ancestor) and he is likely to have the Modal Haplotype for that particular Lineage. And as we go back in time, we would identify NPEs (Non-Paternity Events), and we would identify other surnames to which we are directly related but with whom our common ancestor is prior to the time of surnames. And the tree would continue even further back than that, based on the SNP discoveries that are continuously being made in the ongoing Haplogroup Projects.

Eventually, by superimposing known genealogies on top of the Mutation History Tree, we could build a comprehensive evolutionary tree of all Mankind, that travels back in time to "Genetic Adam" and travels forward to culminate with each living person today.

Maurice Gleeson
7 August 2015

* Modal Haplotype - this is the haplotype (i.e. your genetic signature, your sequence of STR marker values) that is derived from the most frequent value for each of the STR markers in turn among members of the same Lineage. It is likely that the Modal Haplotype is identical or almost identical to the haplotype that the Common Ancestor of the Lineage would have had. In other words, he would have passed on the identical marker values to most descendants and only some of them would have developed the occasional mutation along the way.

Wednesday 29 July 2015

Chromosomes, Markers & Evolutionary Trees

Let's recap on some of the basic science behind Y-DNA as this will help you understand what you are seeing when you look at your results, and how your results can be applied in practice.

Chromosomes - a closer look

We have 46 chromosomes, arranged in 23 pairs. Each pair has 2 copies, one of which you got from your mother, the other from your father. So for example, you have one paternal chromosome 14 and one maternal chromosome 14. Before you were conceived, your father made a copy of each of his 46 chromosomes but only passed on one copy from each pair to you. Similarly your mother made copies of all her 46 chromosomes but only passed on to you one copy from each pair. In this way the 23 chromosomes you got from your father combined with the 23 from your mother to bring your chromosome quotient back up to the usual 46.

click to enlarge

The 23rd pair is also known as the sex chromosomes. There are two types of sex chromosome - an X and a Y. At conception, if two X chromosomes combine, a female child is produced (XX). If an X and a Y chromosome combine, a male child is produced (XY). Women (XX) only have an X chromosome to pass on to their offspring, whereas men (XY) can pass on either an X or a Y to their offspring. Therefore the man's contribution decides the gender of the child. Women do not have a Y chromosome and so cannot do this particular DNA test.

Thus the Y chromosome is only passed on from Father to Son.  This is why it is perfect for tracing the father's father's father's line and is the main type of DNA used for surname studies. Be aware though that it only assesses this single ancestral line, and if you go back 10 generations, this represents only 1 of your 1024 ancestors (which is equivalent to about 0.1% of your ancestors at that particular level).

Each of our 46 chromosomes consists of a long double-stranded helix of DNA. If we unwrapped it, it would look like a long ladder extending into infinity, or a railway track running from New York to Los Angeles. It's huge. If you untwisted all 46 chromosomes from a single cell, it would stretch for 2-3 metres (6-10 feet). All the untwisted DNA from the human body would stretch to the moon and back several times.

All along the "ladder" are the nucleotide bases, like rungs in the ladder, binding each strand of the helix to the other strand of the helix. The bases are called A, T, C, and G, after the first letters in their respective names - Adenine, Thymine, Cytosine, & Guanine. A only ever binds with T, C only ever binds with G. You can remember this by thinking the straight-sided letters only bind to each other, and the curved letters bind only to each other. Each base pair effectively forms a rung in the ladder.

click to enlarge

Because A only ever binds with T, and C only ever binds with G, if we know the sequence of bases on one strand of the helix, we automatically can tell what bases are on the other strand. Therefore, the sequence of bases along the DNA is only ever written as a single line of letters (e.g. ATCCGAATTGG). The sequence is read from what is called the 5' (5 prime) end of the DNA molecule (and is read toward the 3' end, like reading from left to right).

In each pair of chromosomes, the two copies (maternal and paternal) are virtually identical to each other in terms of size, length, morphology, etc. The exception is the sex chromosome pair, X and Y ... the X chromosome is 3 times bigger than the Y chromosome.

Although each chromosome in a pair is virtually identical, there are subtle differences between the nucleotide bases that run along the entire length. These variations in the bases are called mutations and can be identified because they occur at specific locations along the chromosome. These locations where mutations occur are referred to as DNA "markers". Each marker can be identified because it occurs at a specific position along the chromosome and thus can be given a particular name (e.g. DYS390 or Z255). People who share the same mutation may have inherited it from a shared Common Ancestor, and this is why DNA can be so helpful for genealogy.

A note on terminology: Y-DNA refers to the Y chromosome. Autosomal DNA refers to all the chromosomes EXCEPT the last pair (Pair 23, the sex chromosomes, X and Y - all the other chromosomes are called autosomes, hence autosomal DNA). Mitochondrial DNA refers to the DNA found in mitochondria (the "batteries" that power each cell). For a more detailed introduction to the three types of DNA test and how they are applied in genealogy, watch this YouTube video here.

The different types of DNA marker

There are two types of DNA marker - STR markers and SNP markers.

STR stands for Short Tandem Repeat and the key word here is "repeat". An STR marker is a sequence of bases repeated many times (e.g. CATCATCATCAT). In this example, the sequence is CAT and the repeat value of the sequence is 4. When the DNA is being copied before being passed on to any offspring, there are occasional mistakes made in the copying process. So for example, a copying mistake in the CAT sequence above might result in 3 repeats instead of 4, and so the value of that marker may shift from 4 in the parent to 3 in the offspring. This may be the first mistake to be made in this particular marker for many generations, and so not only will the male child differ from his father, grandfather, and great grandfather, but also from all his male siblings and cousins, who will all have a value of 4 for this particular marker.

The second type of DNA marker is the SNP marker, which stands for Single Nucleotide Polymorphism. The key word here is "substitution" - a single base at a specific location changes from what it normally is to a different base (e.g. an A changes to a C or a T or a G). Whereas the STR markers involve several bases in a row, the SNP marker only involves the substitution of a single base.

click to enlarge

Kelly Wheaton has written some excellent blog posts about DNA markers on the Y chromosome. You can read them by clicking here - STR markers & SNP markers.

There are some very important characteristics of STR and SNP markers which are key to understanding how they are applied in surname studies:
  • Mutations in STR markers are written as the value of the marker (e.g. 12) whereas mutations in SNP markers are given names (e.g. Z255) or are written as the location on the chromosome followed by the change that occurred in the bases there. For example, 17349992 (G>A) indicates that a G has been replaced by an A at position 17349992.
  • The mutation rate of STR markers varies from marker to marker. Some mutate relatively quickly (e.g. 1 mutation every 5 generations) whilst others mutate very slowly (e.g. 1 mutation every 500 generations). Mutations in slow-mutating markers are very good for studying human migration, whereas mutations in fast-mutating markers can be very useful for genealogy research (in the last 500 years or so).
  • A big problem with STR markers is that they can mutate back as well as forward. So for example an STR marker may have a value of 4 which changes to a 3 and then back to a 4. The first mutation (4 to 3) may have occurred 1000 years ago, and the second one (3 back to 4) may have occurred 300 years ago. The trouble is that the Back Mutation masks the fact that there was a significant mutation 1000 years ago and this may result in people with the 4 value being assigned to the wrong branch of the human evolutionary tree and hence the wrong family tree!
  • Another problem with STR markers is the Parallel Mutation. This happens when two very separate branches of the same family experience the same mutation "in parallel", giving the impression that the two branches are more closely related than they actually are in reality.
  • A further problem with STR markers is that it is very difficult to identify a Back Mutation, or a Parallel Mutation. And as a result we don't know how often they occur. We suspect that it happens fairly frequently, perhaps as often as a marker value mutates forward it also mutates back. We really don't know. But such "hidden" back mutations may seriously confound our interpretation of the data and may result in people being placed on the wrong branches of the human evolutionary tree.
  • Convergence is the name given to the situation when Back Mutations and Parallel Mutations on STR markers result in people appearing to be more closely related to each other than they actually are. This is a big problem when comparing people at 12 markers, but less of a problem when comparing at higher numbers of markers (e.g. 37, 67, or 111). However, even at 67 markers significant Convergence has been detected.
  • On the other hand, SNP markers mutate much more slowly. And because there are so many of them, Back Mutations and Parallel Mutations are extremely rare (and easily spotted). For this reason, when using DNA markers to place people on the human evolutionary tree, SNP markers trump STR markers i.e. more reliance is given to SNP markers than to STR markers.

Y-DNA, Population Migration, & the Human Evolutionary Tree

Because the Y chromosome is passed on virtually unchanged from father to son, and because mutations in the DNA markers along the Y chromosome happen relatively infrequently, it is also an extremely useful tool for studying the last great human migration out of the African Motherland (about 50,000 years ago) that ultimately led to the populating of the entire planet. There is an excellent interactive animation of human migration here, including the various ice ages and the catastrophic eruption of the Mount Toba volcano that almost destroyed Mankind.

Population geneticists have been studying the evolution of mutations on the human Y chromosome (and on mitochondrial DNA) for many years and have developed an evolutionary tree based on these mutations (called the Haplotree).  They refer to each of the major branches of the tree as Haplogroups and have named them after the letters of the alphabet (e.g. Haplogroup R, or its subgroup Haplogroup R1b). You can think of a Haplogroup as a group of people with a broadly similar genetic signature.

click to enlarge

As modern humans moved around Africa and then moved out of Africa and spread to different places around the world, the humans who moved to Europe developed a totally different set of mutations to those humans who moved to India or Australia (for example). Thus certain haplogroups are found more commonly in Europe (e.g. R1b, I2b) than in India (e.g. H, L) or Australia (e.g. C, T).

Furthermore, genetic genealogy is a very young science, and more markers are being discovered all the time (thanks to novel tests like the Big Y test from FTDNA). As a result, scientists are still discovering finer and finer sub-branches of the human evolutionary tree, and we are approaching the point where we will discover the finer branching patterns associated with individual surnames (such as those in the Gleason/Gleeson DNA Project).

The old nomenclature for the various branches of the tree used a long string of letters (e.g. R1b1a2a1a2c1e) but this has been superseded by a system that simply puts the main Haplogroup letter followed by the "terminal SNP" (e.g. R-Z255). You can still see both terminologies in use on the ISOGG tree.

The terminal SNP refers to the SNP marker that currently occurs at the end of a branch. The word "currently" is important because as new SNP markers are discovered the current terminal SNP marker is likely to be replaced with a new one, and we will continue to move further and further down the finer branches of the tree until we identify SNP markers that are specific for your own family branch and even single individuals.

This will eventually allow us to reconstruct family trees based on DNA marker mutations. These are sometimes called phylogenetic trees, sometimes cladograms or phylograms, but my favourite is Mutation History Trees because it sounds similar to Family History Trees. The difference between the two is that Family History Trees are constructed using named individuals, whereas Mutation History Trees use DNA markers. It should be possible to superimpose one upon the other and in this way we can look 'beyond the Brick Wall" of individual pedigrees and see where different family branches are likely to connect. This in turn will help focus further documentary research.

There are various groups working on the human evolutionary tree and they have produced their own version of the haplotree:

  • The YCC Haplotree is produced by the Y-Chromosome Consortium. This is an academic effort and it is frequently out of date, being surpassed by the ISOGG tree which is updated much more frequently and harnesses the continuous output of genetic genealogists working on Haplogroup Projects (such as the R-Z255 & Subclades Project to which all members of Lineage II in the Gleason/Gleeson DNA Project belong). The most recent update of the YCC tree is from March 2015 but the tree itself is not user-friendly.
  • The ISOGG tree is the result of the efforts of ISOGG (the International Society of Genetic Genealogy) who co-ordinates the analysis and interpretation of the findings from various Haplogroup Projects and as a result has developed a much larger tree than the YCC Tree. It too is quickly out-dated as the pace of new SNP marker discovery advances and further sub-branches are discovered. Lineage II members can click here and search (Cmd+F or Ctrl+F) for Z255 to see where this particular sub-branch sits on the main Haplogroup R branch.
  • Several of the commercial companies have developed their own haplotrees which at times may be more advanced than the ISOGG tree, and at times less advanced:
    • FTDNA tree - this can be accessed from the Haplotree & SNPs page of your personal FTDNA webpage
    • YFULL Experimental Tree - YFULL is a company that offers SNP testing and will interpret the results of SNP testing carried out by other companies. This tree is relatively easy to navigate but again requires use of the Find function (Cmd+F or Ctrl+F).
    • FGC tree - like YFULL, FGC (Full Genomes Corporation) also offer SNP testing and interpretation. The visual presentation of the tree is not easy to navigate.
  • Haplogroup Project Administrators work at the coal face of scientific discovery in relation to the finer branches of their own particular haplogroup project. The R-Z255 & Subclades Haplogroup Project updates its draft tree periodically as new member results come in to the project. You have to sign up to the project to access these updates but here is the most recent update as of July 15th (for members only). It is important to appreciate the pivotal role that Haplogroup Project Administrators are playing in the ongoing discovery of the finer branches of the tree. Surname Project Admins will work closely with Haplogroup Project Admins to advise their project members regarding which tests to take next and why.
  • Alex Williamson's "Big Tree" is a tree that specifically focuses on the Haplogroup R-P312 branch of the human evolutionary tree (of which Z255 is a subgroup). Alex has done incredible work placing newly discovered SNP markers in their best estimated position on the tree, and most importantly for us, creating a visual representation that is easy to navigate and makes the current state of the tree so much more understandable. The members of Lineage II feature here too, in the Z255 subsection. There are two interesting features to Alex's tree:
    • if you click on the name of any individual, an analysis of their unique genetic signature comes up. Here is the analysis for member N74958 showing his position on the tree, his unique mutations, and his putative haplotype progression (i.e. the estimated progression of his mutations from previous ancestors).
    • the Overlay STR Feature allows you to compare the results for all STR markers (one by one) across the whole group. Here it is for DYS439.
  • Nigel McCarthy runs the McCarthy DNA Project and has pioneered the development of phylogenetic trees based on a combination of SNP and STR markers. Luckily for us in Lineage II, one particular area of his research is also focussed on the Z255 subclade to which we belong (Group E in his project). We'll be talking a lot about Nigel's work in due course as it is particularly relevant to the next steps in the DNA Project for Lineage II members.

The portion of Alex Williamson's "Big Tree" that deals specifically with members of Lineage II

You may have to read this several times before a lot of the information sinks in but stick with it - it's worth it! Knowing the basics behind the science of Y-DNA and how it can be applied will help you understand a lot of the discussion about SNP testing and Big Y results that will follow in subsequent posts.

Maurice Gleeson
30 July 2015