data.scientist() + analytics.engineer()
Cynthia Mutua. I make data useful.

I build pipelines, models, and dashboards that turn messy real-world data into decisions that actually matter. MS Data Science & Analytics at GVSU graduating Dec 2026. Worked across education, curriculum policy, and public health research. Ready to bring the same rigor to any industry where data drives outcomes.

Python · R · SQL ETL Pipelines BigQuery · GCP Statistical Modeling Power BI · Looker Studio Machine Learning dbt ✦ Data Storytelling EFA · CFA · GLMs A/B Testing Python · R · SQL ETL Pipelines BigQuery · GCP Statistical Modeling Power BI · Looker Studio Machine Learning dbt ✦ Data Storytelling EFA · CFA · GLMs A/B Testing
Data Engineering Statistical Modeling ETL Pipelines Machine Learning Analytics Engineering Data Visualization Graduate Research Community Leadership Data Engineering Statistical Modeling ETL Pipelines Machine Learning Analytics Engineering Data Visualization Graduate Research Community Leadership
About

A builder at heart,
curious about everything.

I came to data science from mathematics, which means I actually like the hard parts. I've analyzed national curriculum data in Kenya, built ETL pipelines for thousands of students, run latent profile analyses on teacher burnout, and written enough DAX measures that they appear in my dreams.

What drives me isn't the field, it's the problem. Give me a messy dataset, a real question, and enough coffee, and I will figure it out. That's true whether the dataset is about schools, patients, weather, consumers, athletes, or anything else.

Outside of work: yoga keeps me sane, anime keeps me inspired, and adrenaline activities keep me honest about what I'm actually capable of. I'm also slowly working through a bucket list that includes competing on The Amazing Race and visiting all 7 continents, currently 2 down.

🌍
Nairobi → The World
JKUAT · KICD · GVSU. Two continents, one consistent obsession with making data useful.
Adrenaline junkie (reluctantly)
Scared of heights. Has done it anyway. The data says this will continue.
🎌
Yoga + Anime + Ambition
The combo that shouldn't work but absolutely does. Balance is a lifestyle.
🌐
Bucket list: 7 continents
2/7 down. Competing on The Amazing Race is a dream of mine.
Journey
2018–22
BSc Mathematics & Computer Science
Jomo Kenyatta University of Agriculture & Technology · Nairobi · Volunteer Tutor
2022–23
Data Analyst & Researcher
Kenya Institute of Curriculum Development · National-scale student data · Policy impact
2024–now
MS Data Science & Analytics · Dec 2026
Grand Valley State University · Graduate Assistant · Blueprint Labs Grant
2025–now
Graduate Assistant Data Analyst
GVSU Charter Schools Office · 78 schools · 34K+ students · Power BI · ETL · Azure Maps
Leadership
GSA Administrative Officer · WiC Treasurer · Resident Advisor
Graduate Student Association · Women in STEM advocacy
Skills

What I bring to the table

🐍
Python
PythonpandasNumPy scikit-learnmatplotlibplotlymlxtend
📐
R & Statistics
Rtidyversetidymodels ggplot2lavaanQuartoSPSS
🔧
Data Engineering
ETL Pipelinesdbt ✦SQL PostgreSQLPower QueryGitHub
☁️
Cloud & Storage
GCPBigQueryGoogle Cloud Storage Google ColabSQLite
📊
Analytics & BI
Power BIDAXAzure Maps TableauLooker Studio
🧮
Statistical Methods
GLMEFA / CFALPA Permutation TestsBootstrapClustering
💬
Communication & Soft Skills
Stakeholder CommunicationData Storytelling Research WritingLaTeXCollaboration
🌟
Leadership
GSA Admin OfficerWiC Treasurer Resident Advisor Women in STEM

✦ dbt Fundamentals — in progress

Projects

Things I've actually built

01
Data Engineering · ETL · Cloud
Weather Data ETL Pipeline

Production-pattern pipeline: 5 modular phases, dual-target loading (SQLite + BigQuery), regex schema sanitization, structured logging, row-count validation. The kind of engineering that holds up under scrutiny.

96K+ records GCP / BigQuery Live dashboard
PythonBigQuerypandasLooker Studio
02
Machine Learning · Data Mining · Clinical
Pattern Discovery in Alzheimer's Disease

Full KDD pipeline on 2,149 patients. 93.8% accuracy with a fully interpretable decision tree. Reframed clustering failure as a scientific finding about disease continuity, the kind of insight that only comes from rigorous analysis.

93.8% accuracy 39 rules 2,149 patients
Pythonscikit-learnmlxtend
03
Statistical Analysis · Public Policy
Gun Violence & Poverty in the U.S.

Rigorous statistical investigation; permutation tests, 10K bootstrap samples, Welch's t-test, proving poverty drives gun violence rates while population does not. Not a hot take. A finding.

100K+ incidents p < 0.05 10 years · 50 states
RQuartoggplot2plotly
04
GLM · Educational Research
Family Engagement & the Achievement Gap

Does family engagement close socioeconomic achievement gaps? Tested across 25K students, four statistical models, convergent findings. 18% stronger protective effect for low-income students. Answer: yes, and here's exactly why.

25K+ students 4 models Policy implications
RtidymodelsGLMPoisson
By the numbers
100K+
Students in Kenya national curriculum dataset at KICD
34K+
Students tracked across 78 U.S. charter schools
93.8%
ML accuracy — Alzheimer's interpretable decision tree
96K+
Weather records processed through production ETL
Opportunities

Available Fall 2026.
Open to almost anything.

I'm not looking for a job in a specific industry; I'm looking for a role where the data is hard, the impact is real, and the team is serious. I've worked in education, public health, and curriculum policy. I'm equally curious about tech, sports analytics, fintech, consumer products, AI, and wherever else good data problems live.

Open to relocation — Florida, Atlanta, Austin, Dallas, D.C., North Carolina, California, and beyond.

🤖 AI / Tech
👟 Sports & Consumer
🏥 Health Tech
💰 Finance & Fintech
📱 Product Analytics
🎬 Media & Entertainment
🏫 EdTech
🔬 Research & Policy
🛒 Retail & E-commerce
⚡ + Anywhere data matters
Data Analyst
Dashboards · reporting · business insight
Open
Data Scientist
Modeling · ML · statistical analysis
Open
Analytics Engineer
dbt · pipelines · data modeling
Open
Data Engineer
ETL · cloud · infrastructure
Open

Let's work
together.

Whether you're hiring, collaborating, or just want to talk data, I'm always up for a good conversation.

✉ cmm.mutua@gmail.com LinkedIn ↗ GitHub ↗