MakeAIVisible - Implementation Guide

01 / The Crisis

The Black Box No One Is Opening

The data to understand AI's impact on teenagers already exists, but it remains inaccessible to families, educators, and independent researchers. Without transparent datasets and validated scoring, harmful influence patterns remain invisible.

⚠️

The Problem

Most evidence is trapped in closed platform logs. Public safety discussions rely on anecdotes rather than reproducible data. Teens and parents cannot independently evaluate long-term influence patterns because those patterns are not surfaced in an open, research-ready system.

⚡

The Solution

Run a strict four-layer pipeline where PII is removed before storage, score every conversation across six behavioral dimensions, validate with human experts, and publish only anonymized aggregate outcomes with differential privacy.

Pipeline Layers

Behavioral Dimensions

Open Challenges

99.9%+

PII Detection Target

02 / Milestones

Phased Delivery Plan

The roadmap moves from foundational governance to anonymization, NLP scoring, dashboard publication, and academic validation. Each phase maps directly to open GitHub challenges.

PHASE 0

Governance Baseline

Set up repositories, contribution templates, code of conduct, issue labels, and CI skeleton so community work can scale safely.

PHASE 1

Collection + Anonymization

Ship the mobile-first submission portal and a production anonymization pipeline that strips PII before any persistence.

PHASE 2

NLP Scoring v1

Deliver six-dimension scoring with a human-in-the-loop review queue using anonymized-only data access.

PHASE 3

Public Dashboard + DP

Release the AI Influence Map with differential privacy and anonymous token-based report delivery.

PHASE 4

Research + Publishing

Finalize IRB-compatible consent workflows, methodology documentation, and ODbL dataset publishing pipelines.

03 / The Pipeline

Four Layers. Privacy by Architecture.

Personally identifiable information is stripped before anything is stored. Raw logs do not survive anonymization. Controls are enforced in architecture, not only policy.

Collect

Mobile-friendly portal with no account and no identity collection. Accept imports from ChatGPT, Claude, Gemini, and Copilot exports.

Anonymize

Automated multilingual PII stripping on receipt with a 99.9%+ detection target; raw logs are destroyed immediately after anonymization.

Analyze

Score each conversation 0-100 across six behavioral dimensions, then route anonymized records into human expert review.

Reveal

Publish differential-privacy-protected aggregate insights on the AI Influence Map and deliver personal reports through anonymous tokens.

04 / Open Challenges

12 GitHub Challenges

Every challenge is published as an issue with acceptance criteria and ownership labels. Comment to claim, open a draft PR early, and reference the issue in your PR.

challenge: pii

#1 PII Detection (99.9%+)

Build multilingual PII detection and removal with strict false-negative controls before any data persistence.

Acceptance: automated anonymization blocks names, emails, phone numbers, and locations with adversarial test coverage.

NLP Python Privacy Engineering Regex + ML Hybrid Adversarial Testing

Anonymization-Service →

challenge: privacy

#2 Anonymous Token Architecture

Design correlation-resistant anonymous tokens with no identity lookup table and strong replay/abuse protections.

Acceptance: report retrieval and contribution handles remain unlinkable across contexts.

Cryptography Security Architecture Zero-Knowledge Proofs Formal Verification

Anonymization-Service →

challenge: nlp

#3 NLP Scoring Engine (Core)

Score every conversation from 0-100 across six dimensions: Dependency/Reliance, Emotional Influence, Opinion Shaping, Epistemic Autonomy, Age-Appropriate Engagement, Transparency & Honesty Signaling.

Acceptance: reproducible per-dimension scoring with evaluation against expert-labeled datasets.

Transformers Model Training Behavioral Science Evaluation MLOps

NLP-Scoring-Engine →

challenge: pii

#4 Multilingual Support Pipeline

Expand anonymization and scoring support for multilingual logs (ES, ZH, AR, and additional languages).

Acceptance: measurable anonymization and scoring quality across non-English test suites.

Multilingual NLP NER Evaluation Data Curation

Anonymization-Service →

challenge: dashboard

#5 Real-Time Dashboard + Differential Privacy

Build the AI Influence Map with privacy-preserving aggregates and clear cohort-level insights.

Acceptance: no individual-level leakage and transparent DP methodology for public metrics.

React Data Viz D3 Differential Privacy FastAPI

Dashboard →

challenge: research

#6 Human Review Queue + Expert Onboarding

Create a human-in-the-loop review workflow where experts see only anonymized data and can calibrate scoring quality.

Acceptance: reviewer onboarding, queue tooling, and quality control metrics are documented.

Review Ops UX Queue Systems Quality Assurance

NLP-Scoring-Engine →

challenge: frontend

#7 Submission Portal UX (Mobile-First)

Improve no-auth submission flow with clearer export guidance and lower drop-off on mobile devices.

Acceptance: optimized upload journey for ChatGPT/Claude/Gemini/Copilot exports on phones.

Frontend UX Research Accessibility Performance

Portal-Frontend →

challenge: privacy

#8 Anonymous Token Delivery System

Deliver analysis reports to submitters without collecting identity and without correlation to raw submissions.

Acceptance: end-to-end report delivery proven unlinkable to user identity artifacts.

Security Architecture Token Design Backend Threat Modeling

Portal-Frontend →

challenge: research

#9 IRB-Compatible Consent Framework

Design ethics and consent workflows suitable for minors while preserving anonymous data submission requirements.

Acceptance: legally and ethically reviewable consent model documented for implementation.

Ethics IRB Policy Legal Collaboration

Governance-Documentation →

challenge: research

#10 Dataset Publishing Pipeline

Publish anonymized datasets under ODbL and mirror releases with DOI-linked publication flows.

Acceptance: reproducible release process to HuggingFace Datasets and Zenodo-style archives.

Data Engineering Open Data Licensing Automation

Dataset-Publishing-Pipeline →

challenge: community

#11 Community Governance Framework

Define contribution governance, issue ownership norms, and transparent decision-making for a distributed open project.

Acceptance: governance model, role definitions, and review process approved in community discussions.

Open Source Governance Community Ops Documentation Facilitation

Governance-Documentation →

challenge: research

#12 Academic Validation + Methodology Docs

Establish validation protocols for scoring and publish defensible methodology so findings can be independently audited.

Acceptance: methodological documentation supports replication and academic peer review.

Statistics Methodology Peer Review Scientific Writing

Governance-Documentation →

05 / Contribution Workflow

How to Contribute Effectively

Follow a consistent open-source flow so work is reviewable, non-duplicative, and aligned with challenge milestones.

1️⃣

Claim an Issue First

Pick one challenge issue and leave a comment before coding so duplicate effort is avoided.

2️⃣

Fork + Feature Branch

Create your branch from your fork and keep each PR focused on one challenge or one acceptance criterion.

3️⃣

Open Draft PR Early

Open a draft PR early, link the issue in the description, and coordinate dependencies in GitHub Discussions.

4️⃣

Pass CI Before Review

Mark PR ready only after tests and checks pass, with clear notes on validation and edge cases.

5️⃣

Use Project Labels

Tag work with challenge labels (`challenge: pii`, `challenge: nlp`, `challenge: dashboard`, `challenge: privacy`, `challenge: research`, `challenge: community`).

6️⃣

Keep It Open & Reusable

Code, methods, and datasets should remain auditable and reusable under project licenses and governance rules.

We're teenagers.We're making AI visible.

The Black Box No One Is Opening

The Problem

The Solution

Phased Delivery Plan

Governance Baseline

Collection + Anonymization

NLP Scoring v1

Public Dashboard + DP

Research + Publishing

Four Layers. Privacy by Architecture.

Collect

Anonymize

Analyze

Reveal

Six Open-Source Repositories

Portal-Frontend

Anonymization-Service

NLP-Scoring-Engine

Dashboard

Dataset-Publishing-Pipeline

Governance-Documentation

12 GitHub Challenges

#1 PII Detection (99.9%+)

#2 Anonymous Token Architecture

#3 NLP Scoring Engine (Core)

#4 Multilingual Support Pipeline

#5 Real-Time Dashboard + Differential Privacy

#6 Human Review Queue + Expert Onboarding

#7 Submission Portal UX (Mobile-First)

#8 Anonymous Token Delivery System

#9 IRB-Compatible Consent Framework

#10 Dataset Publishing Pipeline

#11 Community Governance Framework

#12 Academic Validation + Methodology Docs

How to Contribute Effectively

Claim an Issue First

Fork + Feature Branch

Open Draft PR Early

Pass CI Before Review

Use Project Labels

Keep It Open & Reusable

Privacy Protected. Data Open.

🔒 Restricted Access

🌐 Open Access

Make AI visible.Before it's too late.

We're teenagers.
We're making AI visible.

Make AI visible.
Before it's too late.