Driftline

Independent research lab building evaluation tools for structural reliability in generative images.

Driftline studies structural correctness in generative image systems and builds tools to make visual failures easier to measure, compare, and understand.

Current work focuses on the gap between visual plausibility and structural correctness in diffusion-based image models, using controlled probes, scored image subsets, and evaluation workflows designed to support repeatable review.

Driftline Evaluator

Driftline Evaluator measures structural correctness in generative images by classifying recurring failure patterns and comparing how prompts or workflows actually perform.

Driftline Evaluator is a visual evaluation system for generative images. It is designed to measure structural correctness, classify recurring failure patterns, and compare how prompts or workflows perform under repeatable conditions. Instead of focusing only on style or realism, it looks at whether an image holds together as a believable structure.

Measures structural correctness in generated images
Classifies recurring visual failure patterns
Compares how different prompts or workflows perform
Organizes image review with a repeatable scoring rubric
Helps separate believable outputs from subtle or obvious failures
Supports research, benchmarking, and workflow evaluation

Who it’s for

Researchers studying failure patterns in generative images
Builders comparing prompts, models, or workflows
Creative teams who need a clearer way to review outputs
Anyone trying to measure whether generated images are structurally believable

Research

Driftline Research Note 001 — Structural Stability Failures in Diffusion Image Models

A scored observational note examining chair-generation failures under minimal prompt conditions. In a broader baseline run of approximately 7,000 generated images, 280 chair-class outputs were identified. A manually reviewed illustrative subset of 58 chairs was then scored using Driftline’s Structural Validity Score (SVS), yielding an even split between structurally acceptable and structurally failed outputs.

Key finding: even in a curated subset, half of generated chairs fail a basic structural plausibility test.

Driftline Research Note 002 — Hand Prompt Comparisons and Structural Reliability in Diffusion Image Models

A structured hand-family comparison testing whether constrained prompt variations improve structural correctness in diffusion-generated hands. Prompt branches including pose cues, numeric wording, and semantic-styling language were reviewed under a locked rubric designed to separate visual plausibility from anatomical correctness.

Key finding: constrained prompting improved outcomes unevenly, with pose cues reducing outright failures most effectively, but no tested condition solved hand anatomy reliably.

GitHub

Research repositories, experiment materials, and related project work are available on GitHub.

github.com/driftline-ai/driftline-research

Contact

research@driftline-us.ai