Research Program

Production infrastructure,
original research.

An autonomous orchestration system that processes thousands of jobs across multiple model families. One of the things it produces is original computational linguistics research, conducted by Penelope Lawrence.

The engineering enables the research. The research validates the engineering. Both are drawn from the same live system.

143K
7
5
3

The Pipeline

COLLECTION 7 RSS feeds Cron: every 6hrs 143K+ headlines EMBEDDING text-embedding-3-small 1536-dim vectors Batch processing ANALYSIS Semantic axis projection Cosine similarity Cross-outlet comparison OUTPUT Papers (SSRN) Datasets (HuggingFace) Visualizations PUBLISH penelopelawrence.com dnakhla.com HuggingFace Hub ORCHESTRATION ENGINE / MULTI-MODEL ROUTING / AUTOMATED ERROR RECOVERY / SESSION MANAGEMENT Computational Linguistics Architecture Studies Open Datasets Benchmark Studies
Fig. 2 — Automated research pipeline. Headlines collected via RSS, embedded, analyzed through semantic axis projection, published as papers and datasets.

Computational Linguistics

Penny conducts the NLP and computational linguistics research. I built the automated pipeline that collects headlines, runs semantic analysis, and generates the datasets. Her papers are hosted on penelopelawrence.com and SSRN.

Published Media Framing 86K Headlines

Measuring the Gap: How 7 News Outlets Frame the Same Stories Differently

Penelope Lawrence

Semantic axis projection across 86,000 headlines from 7 outlets. NYT carries 2x higher emotional valence than Fox in political coverage. Published on SSRN with a companion HuggingFace dataset.

Benchmark 346 Runs 16 Configurations

OpenClaude Benchmark Study

Penelope Lawrence

Model selection drives 37x more variance than harness choice (7.16 points vs 0.19 points). The harness enables cost routing: DeepSeek V3.2 delivers 94% of Opus quality at 66x less cost.

Architecture Studies

Study Orchestration 6,400+ Jobs

Multi-Model Orchestration at Personal Scale

Architecture documentation for a hub-and-spoke orchestration system processing 6,400+ autonomous jobs. Covers model routing, fallback chains, three-tier memory, and failure taxonomy from production operations.

Read

Open Datasets

Dataset

US News Headlines Corpus

143K+ headlines from 7 major US outlets, collected via automated RSS pipeline. Deduplicated, timestamped, outlet-tagged.

HuggingFace
Dataset

Headline Embeddings

Pre-computed 1536-dimensional embeddings for the full headline corpus. Ready for semantic similarity, clustering, and axis projection analysis.

Coming Soon
Dataset

Framing Analysis Results

Computed framing scores, semantic axis positions, and cross-outlet divergence metrics. The quantitative backbone of the published papers.

Coming Soon