Research Program

Production infrastructure,
original research.

An autonomous orchestration system that processes thousands of jobs across multiple model families. One of the things it produces is original computational linguistics research, conducted by Penelope Lawrence.

The engineering enables the research. The research validates the engineering. Both are drawn from the same live system.

By the Numbers

143K

Headlines

7

Outlets

5

Papers

3

Datasets

The Pipeline

Automated Research Infrastructure

Fig. 2 — Automated research pipeline. Headlines collected via RSS, embedded, analyzed through semantic axis projection, published as papers and datasets.

Computational Linguistics

Research by Penelope Lawrence

Penny conducts the NLP and computational linguistics research. I built the automated pipeline that collects headlines, runs semantic analysis, and generates the datasets. Her papers are hosted on penelopelawrence.com and SSRN.

Published Media Framing 86K Headlines

Measuring the Gap: How 7 News Outlets Frame the Same Stories Differently

Penelope Lawrence

Semantic axis projection across 86,000 headlines from 7 outlets. NYT carries 2x higher emotional valence than Fox in political coverage. Published on SSRN with a companion HuggingFace dataset.

SSRN Full Paper

Benchmark 346 Runs 16 Configurations

OpenClaude Benchmark Study

Penelope Lawrence

Model selection drives 37x more variance than harness choice (7.16 points vs 0.19 points). The harness enables cost routing: DeepSeek V3.2 delivers 94% of Opus quality at 66x less cost.

Download PDF

Architecture Studies

Production Systems / D. Nakhla

Study Orchestration 6,400+ Jobs

Multi-Model Orchestration at Personal Scale

Architecture documentation for a hub-and-spoke orchestration system processing 6,400+ autonomous jobs. Covers model routing, fallback chains, three-tier memory, and failure taxonomy from production operations.

Read

Open Datasets

HuggingFace Hub

Dataset

US News Headlines Corpus

143K+ headlines from 7 major US outlets, collected via automated RSS pipeline. Deduplicated, timestamped, outlet-tagged.

HuggingFace

Dataset

Headline Embeddings

Pre-computed 1536-dimensional embeddings for the full headline corpus. Ready for semantic similarity, clustering, and axis projection analysis.

Coming Soon

Dataset

Framing Analysis Results

Computed framing scores, semantic axis positions, and cross-outlet divergence metrics. The quantitative backbone of the published papers.

Coming Soon

Production infrastructure, original research.

The Pipeline

Computational Linguistics

Measuring the Gap: How 7 News Outlets Frame the Same Stories Differently

OpenClaude Benchmark Study

Architecture Studies

Multi-Model Orchestration at Personal Scale

Open Datasets

US News Headlines Corpus

Headline Embeddings

Framing Analysis Results

Production infrastructure,
original research.