FAQ - Hidden Lineage

What is Hidden Lineage? +

Hidden Lineage is a research-grade genetic archaeology project that analyzes modern DNA against 45,000 years of human history. Unlike commercial DNA tests that focus on recent ancestry, we use ancient genome databases to explore deep evolutionary connections.

Think of it as population archaeology applied to personal genetics - we're asking "where do your genes come from?" across archaeological time scales rather than genealogical ones.

How is this different from 23andMe or AncestryDNA? +

Commercial DNA tests and Hidden Lineage serve different purposes:

Commercial Tests: Find recent relatives, health insights, genealogy (~500 years)
Hidden Lineage: Explore ancient population connections, archaeological ancestry (~45,000 years)

We use different methods (ADMIXTURE vs IBD), different reference data (ancient + modern vs modern only), and answer different questions (population structure vs family trees).

Both approaches provide valuable but complementary insights about your ancestry.

Is this scientifically valid? +

Yes. Hidden Lineage uses the same computational methods found in peer-reviewed population genetics research:

ADMIXTURE algorithm (widely used in academic studies)
Allen Ancient DNA Resource (AADR) - standard reference database
Cross-validation for model selection
Transparent methodology with clear limitations

However, this is preliminary research based on chromosome 22 only. Full genome analysis will provide more robust results.

Validation: Our methods follow established protocols from Reich Lab (Harvard), Pickrell Lab, and other leading population genetics groups. All software versions, parameters, and data sources are documented for reproducibility.

What does "Chr22 only" mean? +

Currently, our analysis focuses on chromosome 22 for computational efficiency and rapid prototyping. This means:

Advantages: Faster analysis, proof of concept, real data insights
Limitations: Reduced statistical power, potential chromosome-specific biases
Future: Full genome analysis (all 22 autosomes) planned for Phase 5

Think of current results as a "preview" - directionally accurate but will be refined with complete genomic data.

Can I upload my own DNA data? +

Not currently. Hidden Lineage is in research preview mode, analyzing a single genome to develop and validate the methodology.

Future plans include:

Public upload platform (Phase 6)
Automated analysis pipeline
Interactive result exploration
Community features for sharing discoveries

What is ADMIXTURE analysis? +

ADMIXTURE is a maximum likelihood algorithm that models individual ancestry as a mixture of K ancestral populations. It's like asking: "If human genetic diversity came from K source populations, what percentage of each would best explain this person's genome?"

Widely used in population genetics research
Unsupervised clustering - no prior population labels needed
Cross-validation determines optimal number of components (K)
Provides quantitative ancestry proportions

Technical Details: ADMIXTURE uses an expectation-maximization algorithm to estimate ancestry fractions and ancestral allele frequencies simultaneously. The method assumes Hardy-Weinberg equilibrium within ancestral populations and linkage equilibrium between markers.

How do you determine the optimal K value? +

We use 5-fold cross-validation to test K values from 3 to 15:

Data split into 5 random subsets
Model trained on 4 subsets, tested on the 5th
Process repeated for all combinations
K with lowest average prediction error selected

For our analysis, K=8 showed the lowest cross-validation error (0.42847), indicating optimal balance between model complexity and predictive accuracy.

Statistical Rationale: Cross-validation prevents overfitting by testing model performance on unseen data. The "elbow" in the CV error curve at K=8 suggests this captures real population structure rather than noise.

What is the Allen Ancient DNA Resource? +

The AADR is the world's largest curated database of ancient human genomes, maintained by Harvard Medical School:

15,000+ ancient individuals from 45,000 years of history
Quality-controlled, contamination-screened samples
Standardized genetic coordinates and metadata
Regular updates with new archaeological discoveries

This database enables direct comparison between modern DNA and specific ancient individuals, revealing connections invisible to commercial tests.

Data Quality: AADR samples undergo rigorous authentication including amino acid racemization, radiocarbon dating, and contamination assessment. Only high-quality samples with >10,000 SNPs are included in our analysis.

How accurate are ancient DNA connections? +

Ancient DNA connections represent population-level genetic similarity, not direct genealogical relationships:

Population Structure: Shared ancestry at the population level
Time Depth: Connections span thousands of years
Geographic Patterns: Reflect ancient migration and settlement
Statistical Confidence: Based on thousands of genetic markers

The I1877 connection indicates shared Middle Eastern ancestry, not that this individual was your direct ancestor.

Interpretation: Ancient DNA matches reveal preserved genetic signatures from ancestral populations. These connections are statistically robust but represent deep population history rather than recent genealogy.

What does 86.7% Middle Eastern component mean? +

This represents the proportion of your genome that clusters with ancient Middle Eastern populations in the K=8 model:

Population Genetics: Shared genetic variants with ancient Levantine groups
Time Depth: Reflects ancestry from Neolithic and Bronze Age periods
Geographic Origin: Ancestral populations from modern-day Turkey, Syria, Lebanon region
Archaeological Context: Early farming communities and urban civilizations

This is fundamentally different from commercial test "ethnicity estimates" which focus on modern political boundaries.

Who is I1877 and why is this connection significant? +

I1877 is a 6,500-year-old Neolithic farmer from Turkey, representing one of the earliest agricultural populations:

Cultural Period: Neolithic Revolution - transition from hunting to farming
Historical Significance: Part of populations that spread agriculture into Europe
Genetic Preservation: Represents "source" Middle Eastern ancestry before major migrations
Personal Connection: Your genome retains genetic signatures from this ancient population

This connection demonstrates preserved ancient ancestry that commercial tests cannot detect.

Why don't commercial tests show this ancestry? +

Commercial tests use different methods optimized for different purposes:

Reference Panels: Modern populations only, no ancient DNA
Time Scale: Focus on recent centuries, not archaeological periods
Methodology: IBD-based matching vs population structure analysis
Resolution: Continental categories vs fine-scale population genetics

Your ancient Middle Eastern ancestry gets lumped into broad "Middle Eastern" or "European" categories, losing the specific archaeological connections.

How reliable are these results with only Chr22? +

Chr22-only results provide directionally accurate insights with some limitations:

Statistical Power: 7,401 SNPs provide robust population structure signal
Validation: Results consistent with known Middle Eastern ancestry
Limitations: Reduced precision, potential chromosome-specific effects
Confidence: Major ancestry components (>10%) highly reliable
Refinement: Full genome analysis will improve precision and add detail

Think of these as high-confidence preliminary results that will be refined with complete genomic data.

Should I still use commercial DNA tests? +

Yes! Commercial tests and Hidden Lineage provide complementary information:

Commercial Tests: Recent relatives, health insights, genealogy research
Hidden Lineage: Deep ancestry, archaeological connections, population history

Use commercial tests for family history and health information, Hidden Lineage for understanding your place in human evolutionary history.

Why do my commercial results look so different? +

Different methods reveal different aspects of ancestry:

Time Scales: Commercial (recent centuries) vs Hidden Lineage (millennia)
Reference Data: Modern populations vs ancient + modern
Categories: Political/ethnic labels vs population genetics clusters
Resolution: Broad continental groups vs fine-scale structure

Your 86.7% Middle Eastern component might appear as "European," "Middle Eastern," or "Broadly West Eurasian" in commercial tests.

Which approach is more accurate? +

Both are accurate for their intended purposes:

Commercial Tests: Excellent for recent ancestry and relative matching
Hidden Lineage: Superior for deep evolutionary history and population structure
Complementary: Together they provide complete ancestry picture
Scientific Validity: Both use established, peer-reviewed methods

The "accuracy" depends on what questions you're asking about your ancestry.

What file formats do you accept? +

Currently in research mode with single genome analysis. Future platform will support:

23andMe: Raw data download (.txt format)
AncestryDNA: Raw data download (.txt format)
MyHeritage: Raw data download (.csv format)
FTDNA: Raw data download (.csv format)

All major consumer DNA testing platforms will be supported in the public release.

How long does analysis take? +

Current computational requirements:

Chr22 Analysis: ~8 hours total pipeline
Full Genome: Estimated 48-72 hours
Future Platform: Optimized for <2 hour turnaround
Queue System: Batch processing for efficiency

Analysis time will decrease significantly with optimized pipeline and dedicated computational resources.

Is my genetic data secure? +

Data security and privacy are fundamental priorities:

No Data Storage: Analysis performed locally, no cloud storage
Anonymization: All identifiers removed from genetic data
Open Source: Analysis code publicly available for audit
No Sharing: Genetic data never shared with third parties

Technical Security: All analysis performed on isolated systems with no network access during processing. Genetic data encrypted at rest and in transit. Complete data deletion after analysis completion.

Can I reproduce these results myself? +

Yes! Complete methodology and code will be made available:

GitHub Repository: All analysis scripts and parameters
Docker Container: Reproducible computational environment
Documentation: Step-by-step analysis guide
Data Sources: Links to all reference datasets

The goal is complete reproducibility and transparency in genetic ancestry analysis.

What computational resources are needed? +

Minimum requirements for reproducing analysis:

RAM: 32GB minimum, 64GB recommended
Storage: 500GB for reference datasets
CPU: Multi-core processor, 8+ cores recommended
OS: Linux (Ubuntu 20.04+) or macOS
Time: 8-72 hours depending on analysis scope

Cloud computing options will be documented for users without local resources.

Frequently Asked Questions

Find Your Answer

⚠️ Experimental Archaic Human DNA Analysis

🤔 Still Have Questions?