The Gleason / Gleeson DNA Project: 2015

Thursday, 3 December 2015

Mutation History Tree for Lineage II

What are Mutation History Trees?

These trees are like family history trees but the "family" part has been replaced with "mutations", or rather, has been augmented with mutations. In other words, when the known ancestors run out, DNA mutations still allow us to keep on going back in time and thus identify the branching pattern within the overall group for each Lineage. It also allows us to estimate when the different branching points between group members occurred. This helps you see where you sit in the overall group and who is most closely related to you. And this in turn helps foster collaboration between project members which will hopefully lead to breakthroughs in your genealogical research.

Copies of Older Versions will be archived so that we can keep track of the evolution of the trees over time as more people join the project and upgrade their tests.

Lineage II (North Tipperary group)

Below is the Mutation History Tree for Lineage II (version 1). You can view it or download it as a high-quality pdf document via this Dropbox link. The tree will be constantly updated as more project members join. You can click on the image to enlarge it and then right click again and "Open in a New Window" (Mac) to make it even larger. Alternatively, you may find it easier to download the image (simply right click on the image and choose download). Or a pdf version of the tree is available in our Facebook group here - this has the added advantage that it is searchable.

But first, you will probably need some explanation of what you are looking at.

Starting at the top, it follows the mutations in STR markers from subclade Z255 (characterised by the Z255 mutation which formed about 4300 years ago [1]) down in time through various subclades (Z16437/Z16439 and Z16438, both formed about 1500 years ago) to the Gleeson MRCA (Most Recent Common Ancestor of Lineage II) and then down to the modern day Gleeson males who have tested in Lineage II.
The tree is based on both SNP and STR data. SNPs are cut & pasted from Alex Williamson's Big Tree. STRs are extracted from our project's DNA Results page.
Fluxes software was used to create an initial STR-based branching pattern. SNPs were added and helped to anchor the upper reaches of the tree.
Branching points have time estimates in generations back from the present. Allowing 30 years per generation gives a good indication of the number of years back to a particular branching point from the testers date of birth (usually between 1930-1950).
STR mutations are written as: marker number, old value - new value
Back Mutations are highlighted in yellow. Parallel Mutations are written in red text.
Branches have been numbered (in bold royal blue).
The number of Markers Tested by each participant is indicated as 25, 37, 67 or 111. BY stands for the Big Y test.
FTDNA Kit numbers, Project G-numbers, and initials represent each of the participants.
Country of present residence indicates the extent of the Gleeson diaspora.
Gleeson Ancestral Lines are indicated for each participant by the green boxes with a crude timeline to the side.
MDKA Profiles list possible MPRs (Markers of Potential Relatedness) which may be particularly useful information for collaboration between project members (especially birth location, family nickname or agnomen, & occasional other major distinguishing features).

Lineage II Mutation History Tree Nov 2015 (version 1)
Download a pdf version here

You can watch a video of the Journey of Discovery I took in creating this tree below.

What does it tell us?

The current starting point for Lineage II is estimated to be more than 25 generations ago. That's about 750 years before the average tester's date of birth. If we assume that to be 1950, and allowing 30 years per generation, then we are looking at an origin for Lineage II about 1200 AD. This estimate may change as more people join the project.

The Glisson branch (G-68, 411177) is an ancient branch, going back about 21 generations ago = 630 yrs = 1300 AD.

The most closely related branches are 8, 5, 4, & 12. These members are actively exploring their connection using traditional genealogical records.

As regards the process of building this MHT ...

The 37 marker test (Y-DNA-37) is sufficient to allocate new participants to Lineage II, but is not adequate for giving accurate estimates of TMRCAs (and hence branching points)
67 or 111 STR markers are needed for more accurate dating of the branching points
STRs better define the branching pattern than SNPs. Thus far, SNP markers appear to be more useful for defining the more upstream branches of the tree and STR markers the more downstream branches. This may change over time.
In order to improve the accuracy of this MHT, we need a) more people to test; b) more people to upgrade to 111 markers; and c) more people to SNP test
In due course, it is hoped that a Lineage II-Specific SNP Marker Panel will be developed (for about $120). This will hopefully save new members money on SNP testing.

A series of blog posts will follow in due course exploring the various steps involved in this creation process in greater depth.

Maurice Gleeson

Dec 2015

Sources:

[1] estimates carry a wide margin of error. Current estimates are found here - http://www.yfull.com/tree/R-Z255/

Tuesday, 1 December 2015

Making Yourself Visible

Some people have been surprised to find that they don't appear on the DNA Results pages even when they have joined the project. Why is this?? The explanation is quite simple.

FamilyTreeDNA changed their privacy settings during the past year and the result has been that customers have to set the Privacy settings themselves. However, not everyone knows about this so some people are inadvertently hiding themselves from view when they don't mean to do so.

Here is an example of the DNA Results page for Lineage II a) when I am signed out of my FTDNA account and b) when I am signed in.

In the first view, 5 people are missing. These are indicated by the red arrows in the second view. These participants are probably unaware that their Privacy settings are such that they cannot be seen in the project by anyone from outside the project (and they can only see themselves if they are signed in to their account).

The problem with this is that it may inadvertently stop new potential recruits from testing. Would you be less inclined to sign up to a project if there were only 11 members instead of 16? I think so. Bigger projects look more impressive. There is more of a chance that you might find a match. Also, supposing the new recruit was called Glisson, they would not see that there is already a Glisson in the project. Or if their ancestor came from Boherlahan - they would not see that there was already someone there with ancestry from exactly the same area. In both these situations, the prospective tester might have been encouraged to take the test if he had seen this information.

The good news is that it is easy to fix. Just follow these simple steps:

Sign in to your FTDNA account
Hover over your Name in the top right
Click on Account Settings, then the Privacy & Sharing tab at the end of the menu bar above
Then simply change the settings under My DNA Results by clicking on the words "Project Members" at the end, and on the next screen checking the box beside "Make my mtDNA & Y-DNA data public". Then press Save.

Before the change

After the change

For comparison, this is what my own Privacy Settings look like:

If you need any help with this, please don't hesitate to email me and I can do it for you in 2 minutes.

Here's hoping someone will see your details on the DNA Results page and will decide to join the project simply because they like something they see there.

Every little helps.

Maurice Gleeson

Dec 2015

Monday, 30 November 2015

New Pages make life easier

Now that the this project website has been up and running for almost a year, it is clear that some posts are visited very frequently, and to help people navigate the website a little easier, I have created a new set of Pages to the right which will help newcomers easily navigate the website and find the information they require.

The new pages include a revised version of the original Welcome blog (Feb 2015) as well as separate pages for people who want to join the project and who want to upgrade their test. There is also a new page which has lots of useful tips on how to get the most out of your DNA Test and everyone should check this out, even if you have been a member of the project for a long time. There may be some useful hints and tips there that you may not be aware of.

Here is the list of new pages. Please feel free to copy and paste this pages list into any email you might send to prospective DNA testers.

Maurice Gleeson

November 2015

Tuesday, 17 November 2015

Exciting times at the 11th Annual FTDNA Conference

I have just returned from the 11th Annual FamilyTreeDNA Conference in Houston (Nov 11-13) where I had the pleasure of meeting my wonderful co-administrator and founder of our DNA Project, Judy Claassen. Even though we have talked previously on Skype and are in constant email communication, it was great to meet Judy in person for the first time and we had a fabulous weekend discussing various aspects of the project. We also networked with some of the best brains within the genetic genealogy community which has helped advance our ideas about next steps for the project.

Maurice Gleeson & Judy Claassen

I also had the privilege of presenting and sharing some of the results of our Gleason/Gleeson DNA project with the audience (made up entirely of FTDNA Project Administrators like Judy & myself). The title of my talk was "Combining SNPs, STRs, & Genealogy to build a Surname Origins Tree" and it will be available on my YouTube Channel here. In this talk, I developed some of the themes that I had touched on in an earlier presentation at Genetic Genealogy Ireland 2015 (Oct 9-11), which is also available on YouTube.

Networking with David Dunbar, James Irvine & Cynthia Wells

In brief, I described the fascinating journey of discovery I have taken these last few months to arrive at the Mutation History Tree for Lineage II of the DNA Project. I have previously posted about the generation of this tree based on 12 & 37 STR marker data but I have held off on publishing the more advanced tree (combining 111 STR marker data and SNP data) pending the arrival of additional DNA results. Well, those results are now in, I have incorporated them into both of my recent presentations, and I will be posting a summary of these results shortly in a subsequent blog post.

The resulting Lineage II tree uses the DNA mutations present among project members to define the branching pattern of the tree and to estimate when each of those branching points occurred. Every member can see where they sit on the tree in relation to every other member and will also have an idea of how closely or distantly they are related to everyone else within the group. And of course as more people join, they will be added to the tree, it will get bigger, and the branching pattern and time estimates will become more accurate.

An early version of the Mutation History Tree for Lineage II
with time estimates for each branching point

After the conference, a few of us stayed an extra day to tour the FTDNA labs. They really have quite a set-up and the resources they have available are impressive. We can expect some great things from FTDNA in 2016,

Bennett Greenspan, CEO, demonstrating a machine in the lab

Bennett Greenspan & Max Blankfield
have a quick chat

I also was also honoured and privileged to receive an award (organised by Brad Larkin of the SurnameDNA Journal) in recognition of my work in genetic genealogy and was voted Genetic Genealogist of the Year 2015, a very humbling experience given that there are so many of my peers equally deserving of such an accolade. The award itself is made of fine wood and is beautifully inscribed with gold lettering. It is going up on the wall of my study as soon as I can lay my hands on a hammer and a nail.

Maurice Gleeson & Brad Larkin

A big thank you to all of the people who have helped me over the years, especially my colleagues within the genetic genealogy community whose conversations and discussions have served to formulate my own ideas about genetic genealogy. I am humbled by the award and honoured to be part of this wonderful community.

Maurice Gleeson

17 Nov 2015

Tuesday, 10 November 2015

MDKA Profile – John Gleeson 1832-1885 (G-21, N74958)

Summary

Name of MDKA: John Gleeson

Family nickname/agnomen: ?McEvaddy or?Rabbit

Date & location of birth: ?1832, Lackagh

Date & location of death: 1885, Shallee

Residence(s): Shallee (probably Shallee Upper & Lower, near to Longstone & Killoscully)

Occupation: Labourer

Other family occupations: Teacher, Miner

Religion: Catholic

Wife's name: Anne Gleeson

Date & location of birth: 1848, Shallee

Date & location of death: 1926, Ballyhourigan

When & where married: 1865, Ballynahinch

Sponsors at wedding: not recorded

Order of children's birth: Martin, Catherine (?Ruby Kathleen), Bridget, Timothy, Winifred,
Order of children's birth: Hanora, Patrick, Mary

Possible father & mother:* Martin & Bridget

Sponsors at baptisms:

1) Pat Gleeson, Wenefrid Finey; 2) Patrick Gleeson, Anne Gleeson; 3) Michael Greene, Mary Gleeson; 4) ??; 5) ?John Gleeson; 6) Thomas McDonnell, Mgt Bevan; 7) ??; 8) John Ryan, Catherine Gleeson

DNA kit no: N74958

Project ID no: G21

DNA Tests taken: Y-DNA-111, FF, FMS, Big Y, Z255 Panel

Terminal Y-SNP sequence: Z255 > Z16439/7 > BY2852 > A5629 > A5628 > Y16880

Closest match(es) ID no: G64 & G57

Place on the Lineage II tree: Branch 6

Link(s) to online tree(s):

Ancestry … click to view (needs Ancestry account)
MyHeritage
Rootsweb … free to view
FamilySearch … under construction
GenesReunited … by invitation only

* based on naming convention

Supporting evidence & documentation

Name of MDKA: John Gleeson

See birth & baptism records of his children below. [1] They record the parents as John Gleeson & Anne Gleeson

Gleeson nickname/agnomen: ?McEvaddy or?Rabbit

Dad’s 1st cousin Liz thinks that the family nickname was MacEvaddy. This is a name from Northern Ireland. Apparently miners came to the area from Northern Ireland so maybe one of them was a McEvaddy. However, I have never encountered any such name in the records for the area.
When Dad and I visited the ancestral home of Anne Gleeson (back in 2012), the present owner of the house said that she thought she remembered seeing the nickname “Rabbit” on some of the deeds or other documentation relating to the purchase of her house. See blog post "What happened to the widow Anne Gleeson?"

Date & location of birth: ?1832, Lackagh

Based on the fact that his parents may have been named Martin & Bridget, I found a possible baptism record in Lackagh, a neighbouring townland to Shallee. However, this may not be the correct one [2]

Date & location of death: 1885, Shallee

Death certificate found for a John Gleeson who died in Shallee aged about 50. The informant was ??? [3]

Residence(s): Shallee

It is not certain where in Shallee the family lived. Several children were born in Shallee, others in Longstone, and one in Killoscully, so it is assumed that the family lived in the townland of Shallee Upper & Lower (rather than in the neighbouring townlands of Shallee Coghlan or Shallee White). It has not been possible to find the family in Griffith’s Valuation or in any of the subsequent Cancelled Books, so John may not have been the head of the household, but rather the son or son-in-law of the head of the household.
See blog post "What happened to the widow Anne Gleeson?"

Occupation: Labourer

See birth & baptism records. [1]

Other family occupations: Teacher, Miner

His eldest son Martin was a Teacher in Co. Wicklow (Stratford-on-Slaney, & Baltinglass).
His son Paddy was a Miner in Butte, Montana. He may have worked in the local mines in Sivermines.

Religion: Catholic

See birth & baptism records. [1]

Wife's name: Anne Gleeson

See birth & baptism records. [1] Also church marriage records from 1865 & 1895 - she remarried after his death to a Patrick Ryan (soldier) and lived in Maunsell’s Cottage in Ballyhourigan. [4,5]

Date & location of birth: 1848, Shallee

From Baptism record. [6] Her parents were Timothy Gleeson & Catherine Ryan, consistent with the naming convention predictions (see below) based on her children’s names (lending support to the notion that John’s parents were Martin & Bridget). Also, supportive data for her parents’ names and her siblings was obtained from US records & letters from her daughter (written from 1938-1942).

Date & location of death: 1926, Ballyhourigan

From death certificate. [7] No headstone found.

When & where married: 1865, Ballynahinch

From church marriage record. [4] No civil marriage record was ever found for either her 1865 or 1895 marriages.

Sponsors at wedding: not recorded

See church marriage record.

Order of children's birth: Martin, Catherine, Bridget, Timothy, Winifred, Hanora, Patrick, Mary

See birth & baptism records. [1]

Possible father & mother:* Martin & Bridget

See 1st son and 2nd daughter. His wife Anne’s parents are predicted to be Timothy & Catherine (2nd son & 1st daughter’s names) and subsequent records confirmed this.

Sponsors at baptisms:

See birth & baptism records of his children. [1]

Sources & References

[1] birth & baptism records of his children

No birth records found for Patrick but he appears in the 1901 & 1911 census.

[2] possible baptism record for John Gleeson

[3] death certificate for John Gleeson

[4] church marriage record of Anne Gleeson to John Gleeson (1865)

[5] church marriage record of Anne Gleeson to Paddy Ryan Soldier (1895)

[6] baptism record for Anne Gleeson 1848

[7] civil death certificate for Anne Gleeson 1926 (she remarried to Patrick Ryan Soldier)

Maurice Gleeson

Nov 2015

Thursday, 13 August 2015

Building a Mutation History Tree with STR data

In the previous blog post we explored Genetic Distance and how it can be used to group people together into the same Genetic Family (or Lineage). We also saw how a lot of your relevant non-matches only show up in surname projects because they are outside the FTDNA matching threshold. Lastly, we introduced the concept of the Mutation History Tree (MHT) and how traditional genealogies could be hung on its branches to give a combined tree that uses mutations when named individuals run out.

In this blog post, we will look at how to build a Mutation History Tree using STR data.

Building a simple Mutation History Tree

A huge thank you is due to project member Lisa Little who generated a lot of the data discussed below, and to Nigel McCarthy (Admin of the Munster Irish Project) who has offered invaluable advice.

It is clear from the Results Page of the Gleason DNA Project that each Lineage has a distinctive coloured pattern by virtue of the values for its STR markers. This distinctive coloured pattern reflects the unique marker values for each Lineage, and distinguishes one Lineage from another. The Modal Haplotype for each Lineage represents the "signature tune" for that particular Lineage.

The distinctive coloured patterns of Lineages I, II & III
(from the Results Page on the World Families.Net site)

Furthermore, within each Lineage, there are subtle differences among members. In other words, most people are not an exact match to the Modal Haplotype - everyone in the group sings the "signature tune" just ever so slightly differently. They're all in the same choir, but some hit a bum note! These differences between project members in their STR marker values allow us to construct a diagram that suggests how the various project members might be related to each other.

So, for example, if we take the first 12 markers in Lineage II we could construct a Mutation History Tree (MHT) that separates out the members of the group into different branches. Reading from left to right across the tree in the diagram below, the first 4 members (G22, G05, G57, and G64 in the diagram below) all match the Modal Haplotype at 12 markers (Branch 1). The fifth member (G42) has a single mutation (in red) at marker 385b which has mutated from the modal value of 14 to a new value of 15 (Branch 2). This may have happened in the previous generation or many many generations ago. The 6th member (G21) apparently has 2 mutations* from the modal (on markers 389i1 and 389i2), and the remaining members all have a mutation at marker 390 with 3 of them having an additional mutation on marker 392 (Branch 5).

12-marker MHT for Lineage II
click to enlarge

This diagram gives a pictorial representation of how the different members of Lineage II may be related. The branching pattern derived from the DNA mutations may very well correspond to the branching pattern that one might see in the traditional Family History Tree if we were able to trace it all the way back with documentary evidence to the MRCA (Most Recent Common Ancestor). Thus the Mutation History Tree can give us important clues regarding which individuals are likely to be on the same branch of the overall tree, and who is more closely related to whom. This in turn can help focus further documentary research.

In the example above, the project members in the last branch on the right (Branch 5) are more closely related to each other than to anyone else in the project - they should get together and try to figure out how they are related. Their MRCA is a lot closer in time than the MRCA they share with (for example) the first group in the tree (on the far left, Branch 1). Similarly, because Branch 5 is in fact an off-shoot of Branch 4, the MRCA for these two groups is going to be closer in time than the MRCA either shares with any of the other groups. Thus, for example, Branch 5 members may share an MRCA born in 1750, Branches 4 & 5 share an MRCA born in 1610, Branches 4 & 3 share an MRCA born in 1390, and the MRCA for the entire group was born in 1125.

But this is the pattern based on the values of just 12 STR markers. What happens when we use 37 markers? or 67? or 111? The more markers that are used to generate the Mutation History Tree, the more accurate the picture is likely to become and the more likely we are to approximate the branching pattern in the actual Family History Tree.

However, not everyone in the group has tested to the same level of STR markers. Six people have tested to 111 markers, 3 people have tested to 67 markers, and 2 people have tested to 37 markers. And of the 4 people in "Possible Lineage II", one has tested to 25 markers and 3 have tested to only 12 markers. This obviously creates difficulties in accurately allocating people to the correct branch of the Mutation History Tree and such allocations are likely to change as these members upgrade their results and more data become available.

These are very important points that need to be born in mind and are worth repeating:

Mutation History Trees are only as good as the data available
They are liable to change as more data becomes available
The more data used to generate the tree, the more likely it is to approximate reality

Building a more complex Mutation History Tree

To generate a more advanced Mutation History Tree, using up to 111 markers, we can use programmes such as Fluxus to generate a phylogenetic tree / cladogram that produces a "best fit" model of the tree (i.e. with the fewest number of branches possible for the known mutations). This is often called a "maximum parsimony" approach and the principle is akin to that of Occam's razor which simply states that - all else being equal - the simplest hypothesis that explains the data should be the one that is selected. It may not be the correct one (there are usually other possible alternatives with varying degrees of plausibility), but it has the highest probability of being the correct one.

Using the Fluxus software is not easy - the process is multi-step and complicated, it is not user-friendly, and it takes time. Nevertheless, we will look at the output of this software in a separate blog post.

Below is a summary of the various STR mutations that occur within Lineage II (courtesy of Lisa Little). Mutations from the Modal Haplotype are highlighted in yellow and beige. The markers are divided into their various "Panels" by bold dark lines. Mutations among the first 12 markers are in the first 5 rows (Panel 1); the next 3 rows are mutations among markers 13-25 (Panel 2); and the following 4 rows have the mutations among markers 26 to 37 (Panel 3).

From these it is possible to build a more evolved Mutation History Tree than the one generated using only 12 marker data. As all 11 members have tested to 37 markers, let's construct a tree based on 37 markers and compare it to the one generated from the 12-marker data.

Lineage II's STR mutations (from Lisa Little)

37-marker Mutation History Tree
click to enlarge

The previous 12-marker Mutation History Tree (for comparison)
click to enlarge

In the new 37-maker Tree, additional branches and additional mutations are indicated in pink. The branches have doubled and are now numbered 1 through 10. Branch 3 is an exact match to the Modal Haplotype (but no members sit on this particular branch). There are several important points to note when comparing this new 37-marker Tree to the previous 12-marker Tree:

The tree with more data (the 37-marker Tree) has more branches, more definition, more granularity, more fine detail
The new branches allow us to revise our estimates for when the MRCA for the various branches may have been born. For example, what was Branch 5 has now split into two branches (9 & 10), and the members in Branch 9 (G39, G51) share an MRCA who was possibly born several generations after the MRCA for Branch 9 & 10.
There are several Parallel Mutations in the 37-marker Tree (i.e. identical mutations that occur in several different branches)

464b (16>17) occurs on 2 branches (Branches 4 & 8)
464c (17>16) occurs on 2 branches (Branches 1 & 10)
CDYa (39>38) occurs on 5 branches (Branches 2, 4, 6, 7/8, & 10)
CDYb (40>39) occurs on 2 branches (Branches 1 & 9)
456 (16>15) occurs on 3 branches (Branches 2, 6 & 7)

The apparently large number of Parallel Mutations may be because this is not a "maximum parsimony" tree, and there may be another way of arranging the data that would produce a better "fit" with fewer branches. The Fluxus software could help clarify this.
Alternatively, this may be a very accurate reflection of the real Family History Tree (i.e. how people are actually related ancestrally) and the large number of Parallel Mutations are due to the rapid mutation rate of the STR markers in question. This is entirely plausible as several markers are known to mutate back and forth fairly frequently from one generation to the next (e.g. the CDYa and CDYb markers). FTDNA identifies such rapidly mutating markers by colouring them in dark red on the DNA Results page.
Back Mutations may be present in the tree, but are hidden ... and thus the tree may be wrong

for illustrative purposes, in Branch 2, theoretically there may have been a mutation in one of the members ancestral lines such as CDYb (40>39) in (say) 1470, and a few generations after that a Back Mutation CDYb (39>40) in (say) 1610. This would place these members on 2 different branches rather than on the same branch where they currently sit.
And even though in reality, they share a Common Ancestor in 1470, the Back Mutation masks this, and makes them look much more closely related than they actually are, with a Common Ancestor that appears to be sometime in the 1700's perhaps, 300 years later than it actually is!
This latter point is a common experience when working with any type of DNA - the Common Ancestor is further back than he/she looks.

Fast-mutating markers (in dark red) among the first 37 markers of FTDNA's Y-DNA test

Examples of Mutation History Trees generated using 37-marker results can be found in the Allen Patrilineage Project for their Patrilineage I and Patrilineage II.

When building a Mutation History Tree with larger numbers of markers (67 or 111), software programmes such as Fluxus become indispensable because doing it by hand is much more difficult. And as stated previously, the tree is likely to change as more data is used to generate the branching pattern. If we generated a tree based on 67 marker data it would become even more detailed, and more so too with a 111-marker based tree. Furthermore, adding more data is likely to change the branching pattern within the tree as a new "best fit" model is identified.

And this essential point will be perfectly illustrated when we add SNP marker data into the mix - it throws our 37-marker "best fit" Mutation History Tree into a completely new configuration.

I don't know about you but I can hardly contain myself.

Maurice Gleeson

13 Aug 2015

* the 2 apparent mutations are in fact only 1. The marker DYS389 is a single STR marker that has four parts: m, n, p, and q. At FamilyTreeDNA they have two tests for DYS389. The first test looks at the first two parts of marker DYS389 (m and n). This is what they call DYS389I. The second test looks at all four parts of DYS389 (m, n, p, and q). This is what they call DYS389II. There are, by scientific convention, two ways to display the result of DYS389II. The way FTDNA display the result is by showing the total for the entire DYS389 marker (m+n+p+q). This is described in their Learning Centre FAQ here. Member G-21 has a mutation in 389i - which is indicated by his value of 13 rather than the modal 14. For marker 389ii his value of 29 rather than 30 indicates that in the entire marker (parts m+n+p+q) he still only has a single mutation - that same single mutation which is reported as 389i. In reality, since 389ii includes all four parts of the marker, we should just drop 389i off our table. Any mutation in 389i will be included in 389ii. My thanks to Lisa Little for pointing this out.

Friday, 7 August 2015

Genetic Distance, Genetic Families, & Mutation History Trees

In this blog post, we examine how people are grouped together into Lineages (sometimes called Genetic Families) and how the relationship between people within a Lineage can be mapped out and represented in a Mutation History Tree.

Grouping People Together

The members of each lineage within the project have been grouped together because their genetic signatures (aka haplotypes) are similar, suggesting a common ancestor some time in the past several hundred years. The degree of similarity between any two individuals can be assessed by the Genetic Distance between them, as discussed in a previous blog post and reproduced again below:

Who qualifies as a match to you? Anyone whose marker values are sufficiently similar that they meet the criteria set by FTDNA to be declared "a match". And here are those criteria:

a Genetic Distance (GD) of 0 at 12 markers

a GD of 2 at 25 markers

a GD of 4 at 37 markers

a GD of 7 at 67 markers

and a GD of 10 at 111 markers

The ISOGG Wiki has a very nice summary of Genetic Distance and the criteria for matching.

However, Genetic Distance is not the only possible criteria for grouping people into the same "Genetic Family" or "Lineage". Other considerations include traditional genealogical indicators such as having the same surname (the main criterion for surname studies), or having an ancestor who came from the same location as the ancestors of other group members. Additional genetic criteria may include having the same rare marker values, or having the same terminal SNP. These considerations can also serve as indicators that Lineage members have been grouped correctly i.e. members may be grouped together on the basis of one criterion (e.g. Genetic Distance) and subsequently are found to share a second criterion (e.g. the same rare marker value, ancestors from the same location, or even the same MDKA). You can read more about some of the possible criteria for grouping people together into the same genetic family here.

Why do I match some Project Members but not others?

You will probably notice that not everyone in your Lineage turns up in your list of matches on your Matches Page. The reason for this is that they do not meet the FTDNA criteria for a "match" to you, but they do meet the criteria as a match either to the Modal Haplotype* for the Lineage or to other members of the project. This is one of the key benefits of joining a surname project (such as the Gleason/Gleeson DNA Project) - it can connect you to people within the FTDNA database who do not show up in your list of matches but to whom you are still likely to be related.

click to enlarge

For example, in Lineage II, my Dad (G21; N74958; yellow dot in the diagram above) is a match (at 37 markers) to only 5 of the other 10 members: G22, G57, G64, G55, and G66 (green dots). He does not match the members with red dots. In other words, his genetic distance to the green dot matches is 4/37 or less, whereas his GD to the red dot matches is 5/37 or greater (in fact, reading down, his GD to each of the red dot members is 5/37, 6/37, 5/37, 6/37, & 6/37 respectively).

But let's look at the member closest to the Modal Haplotype* for Lineage II. This is the person with the fewest number of mutations compared to the Modal Haplotype, or (in other words) the smallest Genetic Distance from the Modal Haplotype. There are no exact matches (i.e. 0/37) to the Modal Haplotype in Lineage II, but there are two members who are the closest (GD = 1/37) and these are members G57 and G64 (kits 60393 & 365763). They are uncle/nephew, both carry the surname Little, and they are believed to have a genetic Gleason ancestor somewhere in the last several hundred years. As Project Admin, I have access to their pages and I can look at their matches. This tells me that each of them matches 7 of the 10 other members in Lineage II. I can also see that their Genetic Distance to the remaining 3 members of Lineage II is only 5/37, just outside the FTDNA threshold for declaring them a match. Looking at these 3 latter members, one matches two other people within the project and the other two match 5 project members.

So everyone within the project is a match to at least one other person within the project, but the distance between any two project members can vary considerably - some are very close, others are very distant.

And this creates a wonderful possibility - by analysing the degree of Genetic Distance between members we can construct a diagram of the branching pattern within the group which shows how closely or how distantly each member may be related to each other member. Such a diagram is variously called a phylogenetic tree, a phylogram, a cladogram, or a Mutation History Tree. The same process is used to generate evolutionary diagrams showing where all living creatures sit on the Tree of Life.

Interactive version of the Tree of life

Mutation History Trees

This concept has important potential implications for genealogy. Theoretically, it should be possible to take the known genealogies of each member within a Lineage and hang them onto the appropriate branch within the Mutation History Tree. In this way we will have a combined tree that starts off in modern times with named individuals, goes back along each ancestral line until the named individuals run out (at each branch's Brick Wall or MDKA, Most Distant Known Ancestor), and then the tree continues back in time using genetic marker mutations instead of people, culminating at the Modal Haplotype for that particular Lineage.

A combined Family History Tree & Mutation History Tree

In the combined tree above, named individuals appear in the blue boxes, starting with living individuals (born about 1960) and going back in time to the MDKA for each branch. Most branches have an MDKA born about 1810 to 1840 (which is typical for Irish research), and some have a Brick Wall at a later point (Branches 5, 6, & 7 have an MDKA about 1870). One lucky branch can trace their line back to 1690 which is highly unusual but something we all hope for. You never know when a new member will join your particular Lineage who happens to be in possession of the Family Bible! And because you are genetically related to him, his Family Bible pertains to your family too. In this way, all members within a particular Genetic Family can "piggyback" onto the pedigree of the member with the longest pedigree.

Where the named individuals end, DNA marker mutations take over. STR markers are in yellow, SNP markers are in pink, but this is just a crude representation of what the tree might look like. In reality it would be much more complicated than this.

The person sitting at the intersecting point of all the branches is the MRCA (Most Recent Common Ancestor) and he is likely to have the Modal Haplotype for that particular Lineage. And as we go back in time, we would identify NPEs (Non-Paternity Events), and we would identify other surnames to which we are directly related but with whom our common ancestor is prior to the time of surnames. And the tree would continue even further back than that, based on the SNP discoveries that are continuously being made in the ongoing Haplogroup Projects.

Eventually, by superimposing known genealogies on top of the Mutation History Tree, we could build a comprehensive evolutionary tree of all Mankind, that travels back in time to "Genetic Adam" and travels forward to culminate with each living person today.

Maurice Gleeson

7 August 2015

* Modal Haplotype - this is the haplotype (i.e. your genetic signature, your sequence of STR marker values) that is derived from the most frequent value for each of the STR markers in turn among members of the same Lineage. It is likely that the Modal Haplotype is identical or almost identical to the haplotype that the Common Ancestor of the Lineage would have had. In other words, he would have passed on the identical marker values to most descendants and only some of them would have developed the occasional mutation along the way.