Implementation Guide v0.1

We're teenagers.
We're making AI visible.

This implementation follows the MakeAIVisible strategy: a privacy-first four-layer pipeline, six validated behavioral dimensions, and open collaboration through public GitHub challenges.

Scroll
01 / The Crisis

The Black Box No One Is Opening

The data to understand AI's impact on teenagers already exists, but it remains inaccessible to families, educators, and independent researchers. Without transparent datasets and validated scoring, harmful influence patterns remain invisible.

⚠️

The Problem

Most evidence is trapped in closed platform logs. Public safety discussions rely on anecdotes rather than reproducible data. Teens and parents cannot independently evaluate long-term influence patterns because those patterns are not surfaced in an open, research-ready system.

The Solution

Run a strict four-layer pipeline where PII is removed before storage, score every conversation across six behavioral dimensions, validate with human experts, and publish only anonymized aggregate outcomes with differential privacy.

4
Pipeline Layers
6
Behavioral Dimensions
12
Open Challenges
99.9%+
PII Detection Target
02 / Milestones

Phased Delivery Plan

The roadmap moves from foundational governance to anonymization, NLP scoring, dashboard publication, and academic validation. Each phase maps directly to open GitHub challenges.

PHASE 0

Governance Baseline

Set up repositories, contribution templates, code of conduct, issue labels, and CI skeleton so community work can scale safely.

PHASE 1

Collection + Anonymization

Ship the mobile-first submission portal and a production anonymization pipeline that strips PII before any persistence.

PHASE 2

NLP Scoring v1

Deliver six-dimension scoring with a human-in-the-loop review queue using anonymized-only data access.

PHASE 3

Public Dashboard + DP

Release the AI Influence Map with differential privacy and anonymous token-based report delivery.

PHASE 4

Research + Publishing

Finalize IRB-compatible consent workflows, methodology documentation, and ODbL dataset publishing pipelines.

03 / The Pipeline

Four Layers. Privacy by Architecture.

Personally identifiable information is stripped before anything is stored. Raw logs do not survive anonymization. Controls are enforced in architecture, not only policy.

L1

Collect

Mobile-friendly portal with no account and no identity collection. Accept imports from ChatGPT, Claude, Gemini, and Copilot exports.

L2

Anonymize

Automated multilingual PII stripping on receipt with a 99.9%+ detection target; raw logs are destroyed immediately after anonymization.

L3

Analyze

Score each conversation 0-100 across six behavioral dimensions, then route anonymized records into human expert review.

L4

Reveal

Publish differential-privacy-protected aggregate insights on the AI Influence Map and deliver personal reports through anonymous tokens.

03b / Repositories

Six Open-Source Repositories

Each repository maps to a layer of the pipeline or supporting infrastructure. Pick one, claim an issue, and open a draft PR.

L1

Portal-Frontend

Collect — Submission Portal

Mobile-first, no-auth submission portal for ChatGPT, Claude, Gemini, and Copilot exports.

View on GitHub →
L2

Anonymization-Service

Anonymize — PII Stripping Pipeline

Python service for multilingual PII detection and immediate raw-log destruction after anonymization.

View on GitHub →
L3

NLP-Scoring-Engine

Analyze — Six-Dimension Scoring

NLP engine that scores conversations 0–100 across all six behavioral dimensions with expert validation support.

View on GitHub →
L4

Dashboard

Reveal — AI Influence Map

Public dashboard with differential-privacy-protected aggregates and cohort-level insights.

View on GitHub →
DATA

Dataset-Publishing-Pipeline

Publish — ODbL Dataset Releases

Automated publishing pipeline for anonymized datasets to HuggingFace Datasets and DOI-linked archives.

View on GitHub →
GOV

Governance-Documentation

Govern — Community & Methodology

Contribution guidelines, governance model, methodology docs, and IRB-compatible consent frameworks.

View on GitHub →
04 / Open Challenges

12 GitHub Challenges

Every challenge is published as an issue with acceptance criteria and ownership labels. Comment to claim, open a draft PR early, and reference the issue in your PR.

challenge: pii

#1 PII Detection (99.9%+)

Build multilingual PII detection and removal with strict false-negative controls before any data persistence.

Acceptance: automated anonymization blocks names, emails, phone numbers, and locations with adversarial test coverage.
NLP Python Privacy Engineering Regex + ML Hybrid Adversarial Testing
Anonymization-Service →
challenge: privacy

#2 Anonymous Token Architecture

Design correlation-resistant anonymous tokens with no identity lookup table and strong replay/abuse protections.

Acceptance: report retrieval and contribution handles remain unlinkable across contexts.
Cryptography Security Architecture Zero-Knowledge Proofs Formal Verification
Anonymization-Service →
challenge: nlp

#3 NLP Scoring Engine (Core)

Score every conversation from 0-100 across six dimensions: Dependency/Reliance, Emotional Influence, Opinion Shaping, Epistemic Autonomy, Age-Appropriate Engagement, Transparency & Honesty Signaling.

Acceptance: reproducible per-dimension scoring with evaluation against expert-labeled datasets.
Transformers Model Training Behavioral Science Evaluation MLOps
NLP-Scoring-Engine →
challenge: pii

#4 Multilingual Support Pipeline

Expand anonymization and scoring support for multilingual logs (ES, ZH, AR, and additional languages).

Acceptance: measurable anonymization and scoring quality across non-English test suites.
Multilingual NLP NER Evaluation Data Curation
Anonymization-Service →
challenge: dashboard

#5 Real-Time Dashboard + Differential Privacy

Build the AI Influence Map with privacy-preserving aggregates and clear cohort-level insights.

Acceptance: no individual-level leakage and transparent DP methodology for public metrics.
React Data Viz D3 Differential Privacy FastAPI
Dashboard →
challenge: research

#6 Human Review Queue + Expert Onboarding

Create a human-in-the-loop review workflow where experts see only anonymized data and can calibrate scoring quality.

Acceptance: reviewer onboarding, queue tooling, and quality control metrics are documented.
Review Ops UX Queue Systems Quality Assurance
NLP-Scoring-Engine →
challenge: frontend

#7 Submission Portal UX (Mobile-First)

Improve no-auth submission flow with clearer export guidance and lower drop-off on mobile devices.

Acceptance: optimized upload journey for ChatGPT/Claude/Gemini/Copilot exports on phones.
Frontend UX Research Accessibility Performance
Portal-Frontend →
challenge: privacy

#8 Anonymous Token Delivery System

Deliver analysis reports to submitters without collecting identity and without correlation to raw submissions.

Acceptance: end-to-end report delivery proven unlinkable to user identity artifacts.
Security Architecture Token Design Backend Threat Modeling
Portal-Frontend →
challenge: research

#9 IRB-Compatible Consent Framework

Design ethics and consent workflows suitable for minors while preserving anonymous data submission requirements.

Acceptance: legally and ethically reviewable consent model documented for implementation.
Ethics IRB Policy Legal Collaboration
Governance-Documentation →
challenge: research

#10 Dataset Publishing Pipeline

Publish anonymized datasets under ODbL and mirror releases with DOI-linked publication flows.

Acceptance: reproducible release process to HuggingFace Datasets and Zenodo-style archives.
Data Engineering Open Data Licensing Automation
Dataset-Publishing-Pipeline →
challenge: community

#11 Community Governance Framework

Define contribution governance, issue ownership norms, and transparent decision-making for a distributed open project.

Acceptance: governance model, role definitions, and review process approved in community discussions.
Open Source Governance Community Ops Documentation Facilitation
Governance-Documentation →
challenge: research

#12 Academic Validation + Methodology Docs

Establish validation protocols for scoring and publish defensible methodology so findings can be independently audited.

Acceptance: methodological documentation supports replication and academic peer review.
Statistics Methodology Peer Review Scientific Writing
Governance-Documentation →
05 / Contribution Workflow

How to Contribute Effectively

Follow a consistent open-source flow so work is reviewable, non-duplicative, and aligned with challenge milestones.

1️⃣

Claim an Issue First

Pick one challenge issue and leave a comment before coding so duplicate effort is avoided.

2️⃣

Fork + Feature Branch

Create your branch from your fork and keep each PR focused on one challenge or one acceptance criterion.

3️⃣

Open Draft PR Early

Open a draft PR early, link the issue in the description, and coordinate dependencies in GitHub Discussions.

4️⃣

Pass CI Before Review

Mark PR ready only after tests and checks pass, with clear notes on validation and edge cases.

5️⃣

Use Project Labels

Tag work with challenge labels (`challenge: pii`, `challenge: nlp`, `challenge: dashboard`, `challenge: privacy`, `challenge: research`, `challenge: community`).

6️⃣

Keep It Open & Reusable

Code, methods, and datasets should remain auditable and reusable under project licenses and governance rules.

06 / Data Access

Privacy Protected. Data Open.

Access tiers are enforced architecturally. Raw identifiers are never retained, while anonymized aggregates and methodology remain public and permanent.

🔒 Restricted Access

Project managers only, and never stored long-term.

  • Raw submitted logs are destroyed immediately after anonymization.
  • IP addresses and session tokens are never persisted.
  • Any individual identifiers are removed before storage.
  • Experts review anonymized-only records by design.

🌐 Open Access

Public, permanent, and reusable by researchers and communities.

  • Anonymized conversation datasets (ODbL).
  • Aggregate metrics and scoring distributions.
  • All source code and scoring algorithms (MIT).
  • Research methodology documentation (CC BY-SA).

Make AI visible.
Before it's too late.

Pick a challenge, join Discussions, and help ship the next phase of the open pipeline.

No spam. No tracking pixels. No third-party analytics. We practice what we preach.