A 3D Body from Eight Questions — No Photo, No GPU

8 questions in, 58 Anny body params out. A small MLP trained with a physics-aware loss, runs in milliseconds on CPU. Height accuracy 0.3 cm, mass 0.3 kg, BWH 3-4 cm — better than our photo pipeline on circumferences, without needing a photo. That’s the questionnaire path I promised in the previous post.

The whole story begins with one observation: that height and weight can estimate body measurements quite accurately (Bartol’s regression). The original idea isn’t as accurate as it claims, but after a bit of tuning the results are quite promising.

The questionnaire addresses privacy, speed and cost concerns. Plus we skip the phase where the user spends 5 minutes scrolling for perfect-light, tight-clothes photos. Additionally, it helped us find and address a mass calculation inconsistency in the Anny model, and model the “muscle weighs more” problem.

Backstory

When we want to create a digital twin, we naturally think of HMR photo reconstruction. This route has a lot of ups and downs. During one “down”, the research agent brought up this:

The most striking finding is from Bartol et al. (2022): a simple linear regression from just height + weight (no photo!) predicts 15 body measurements at 1.2-1.6 cm MAE. Many deep learning methods with photos don’t even beat this.

At first I quickly calculated the number of combinations and the number of people, and thought it didn’t make sense. But then, after comparing friends, I thought there might be something to it.

It’s not just height and weight

Intuitively we all know that you can be a man with 178cm and 80kg with a belly, or from the gym. So it wasn’t a surprise that we came up with these two bodies:

Extreme bodies for the same height and weight

They are a bit cartoonish and pushed to extremes, but clearly show the problem.

Next obvious thing to do: the weights from the original regression are public, so we downloaded them and ran them on our validation set. Raw BWH MAE landed around 9-11 cm, up to ~25 cm at the worst. Some of that is measurement-convention mismatch — Bartol slices SMPL meshes at fixed landmark vertex indices (e.g. a “lower belly point” vertex for waist), while we follow ISO 8559-1 anatomical rules (waist at the narrowest point, bust at the breast prominence). Same plane-sweep math, different slice location — bust alone is off by ~10 cm systematically. After correcting for that bias, BWH MAE drops to ~7 cm. Still above Bartol’s own ~3 cm BWH MAE on their data (paper Table 6 on BODY-fit+W: chest 3.0, waist 4.0, hips 2.2 cm), but that’s not really the story — we’re evaluating on a different population. Anny has explicit body-shape variation at fixed h/w that an h+w regression fundamentally can’t see, regardless of how well it was trained. And I’m not saying this to undermine the original research, but rather the opposite — it’s a good spark for this project.

What else carries signal

As the previous example showed, the same height and weight can produce different results, but we can differentiate them via more params. Some obvious ones:

Build/belly — muscular athletic or soft with a belly. Common knowledge is that muscle weighs more than fat, so a fat-heavy body will have more volume (and thus different measurements) than an athletic one.
Shape — there are people with wider hips, while others have a bigger bust. This difference is part of body shape, which tells us how the weight is distributed. The problem I will describe later is that people don’t know their shape.
Cupsize — relevant for women, quite an obvious feature.

These are the features we naturally think of. To make sure they carry enough signal and aren’t too noisy, we ran the numbers against the dataset. The method is simple — bucket people by height (±1 cm), weight (±1 kg), and shape, then measure how much waist variation is left as each additional feature is locked in.

Features locked	Waist std inside bucket	Theoretical best MAE
h, w, shape, build	2.25 cm	~1.8 cm
+ belly	2.08 cm	~1.7 cm
+ cup, gender	1.30 cm	~1.0 cm

Smaller std inside a bucket means the features explain more of what’s going on. Build does most of the work — on its own, it moves the waist by about 1.8 cm at fixed h/w/shape. Belly adds another ~0.2 cm. Cup plus gender knocks 0.8 cm more off. Each feature earns its place.

Side-finding: build signal is strongest on inverted-triangle shapes — 8 of the top 10 high-signal buckets are inverted triangle. The narrow waist amplifies relative fat changes; shapes with wider baseline waists (apple, rectangle) show smaller absolute shifts.

At the extremes: same height, same weight, different body shape — bust can differ by 25 cm, hips by 30. Six clothing sizes at identical h/w. A height+weight regression simply can’t see this — the signal isn’t there in the input.

And there’s a floor. Even with every questionnaire input locked, about 1.3 cm of waist variation stays, coming from ~50 continuous blendshape params that don’t map to any multiple-choice question. So the theoretical best a form can ever do is ~1 cm waist MAE.

Model & dataset

The previous article describes the available body models. After the initial phase we operate solely on the Anny model, heavily leveraging its explainable features. Thanks to it, tasks like generating a huge dataset of people are easy.

The dataset we generate and use for distribution analysis, training and validation contains a couple of tens of thousands of synthetically generated bodies, validated against a broad population distribution. For each body in the dataset we determine the described features using the body measurements.

Anny is full of blendshapes, but for the virtual try-on, not all of them matter. We carefully selected 58 of them which matter here. The 8 questionnaire questions one-hot encode into 20 features, so the space is 20 input x 58 output params. We actually train two such models — one per gender. Male and female bodies differ enough that a shared network wastes capacity reconciling them.

Training a small MLP

The original paper used simple regression to predict the params, so that was the obvious starting point. On our synthetic dataset it gets around 2.5 cm BWH MAE — decent. The problem was mass: Ridge predicts each of the 58 params independently, but mass depends on many of them working together (torso width × depth × height, hip volume, limb fat…). L2 regularization shrinks them all toward zero, and the small errors compound. Result: 3.9 kg mean mass error, 9.7 kg at p95, up to 16 kg for heavy bodies — even after output standardization and tuned regularization (the best Ridge we could build on this dataset).

Histogram of absolute mass error for Ridge vs MLP on 100 test bodies. Ridge: mean 3.9 kg, p95 9.7 kg. MLP: mean 0.3 kg, p95 0.8 kg.

So we moved to an MLP. Two hidden layers, 256 units each, ReLU, a bit of dropout. Tiny — about 85 KB of weights, trains on a laptop in ~60 minutes per gender. Nothing fancy architecturally.

The loss is the interesting part. The user already gives us their exact height and weight — those need to match precisely in the generated body, not just be close on average. Standard MSE on the 58 params doesn’t care about that and treats every param equally. And mass isn’t a param at all, instead it’s a consequence of volume, which comes out of the body model’s forward pass.

So we include the forward pass in the loss. The MLP’s 58 outputs go through Anny — blendshapes, vertices, volume — and we compare the resulting mass and height against the user-provided targets. Gradients from a mass error flow back through all the volume-related params together. Ridge couldn’t do that because each output was solved independently; the MLP can, because the hidden layers couple them. This is what closes the mass gap.

graph LR
    Q[8 questionnaire inputs] --> MLP[MLP]
    MLP --> P[58 Anny params]
    P --> A[Anny forward]
    A --> MHW[mass, height, waist predicted]
    MHW --> L[loss vs targets]
    P --> L
    L -. gradients .-> MLP

The dotted arrow is the whole trick. Anny’s forward is surprisingly autograd-friendly — blendshapes are linear, volume is a sum of signed tetrahedra. No custom backward, standard PyTorch ops end to end. Measurements like waist are differentiable too, but that’s a whole story for the measurements tuning post.

On top of params, mass, and height, we added a waist term. That’s it — bust and hip looked tempting, but in practice they introduced more noise than signal, and waist carries the most body-shape signal anyway.

Honest results

Height is essentially solved — 0.3 cm mean MAE on both genders. Mass lands right there too, around 0.3 kg mean (p95 under 1 kg). Circumferences are harder; BWH sits at 3-4 cm, with waist the weakest.

Averages lie about the tails, and a person who gets a 15 cm bust error doesn’t care that the mean is 4 cm. So we tracked p95 (5% of predictions worse than this) and max alongside the mean, and actively optimized for them — barrier terms in the loss that specifically penalize outliers on height and mass.

	Male	Female
Height — mean / p95 / max	0.3 / 0.8 / 3.9 cm	0.3 / 0.8 / 4.6 cm
Mass — mean / p95 / max	0.5 / 1.2 / 3.3 kg	0.4 / 1.0 / 2.1 kg
Bust — mean / p95 / max	4.9 / 11.9 / 18.4 cm	2.7 / 6.6 / 11.0 cm
Waist — mean / p95 / max	4.3 / 10.0 / 20.7 cm	4.0 / 9.0 / 13.0 cm
Hips — mean / p95 / max	3.3 / 8.4 / 14.8 cm	3.3 / 8.0 / 13.3 cm

For comparison: on the same validation set, Bartol’s h+w regression sits at ~7 cm BWH MAE (bias-corrected, as above). Our photo-based pipeline from the previous post gets 5-8 cm BWH MAE on real people. The questionnaire beats both — without needing a photo.

The numbers above are from synthetic Anny bodies — same model we train against. We also validated on a small group of real people measured by hand with tape. First results there were ugly — mass off by several kg even when circumferences were close. That pushed us to fix how mass is calculated in the first place (next section). After those fixes landed, real-people numbers line up with the synthetic ones on the measurements we tested.

Worth remembering: it’s a statistical model, so what you get is the population-average body for your inputs, not your exact body. Everyone is different — but it’s a very good base for measurements tuning, which then gets <1 cm error. I’m planning the next article on that.

Lessons learned

The most striking was the real-world inconsistency in Anny’s anthropometry module. To calculate the mass, the approach is simple: calculate the volume of the body and multiply by body density. Primary school math. But Anny used 980 kg/m³ density, which is indeed the value you get after typing “average person density” into a web search. However, it’s more subtle than it initially seems.

The first thing is that the value is different for men and women. The second is that “body density” isn’t one number — it depends on the convention. Whole-body density (lungs included, ~985 kg/m³) is what you’d measure by submerging someone in a tank — just below water, which is why humans barely float. Tissue-only density (~1030–1080 kg/m³) is what hydrostatic weighing reports after subtracting residual lung air, and it’s what fat-vs-muscle composition actually gives you. The 980 kg/m³ figure sits between these two conventions — close to whole-body but not quite. The third is that “muscle weighs more”. The per-gender tissue-only medians we ended up using (male ~1059, female ~1031 kg/m³) live in clad-body, derived from body-fat percentage via the Siri two-component model. Empirically the correction works — lean bodies gain mass, soft bodies lose it — though the absolute scale still rests on the 980 calibration being roughly right for the “average” subject.

Density isn’t unique for all people, and muscle has a different density than fat. Not much, but it can change the mass by 2–3 kg. To respond to that, clad-body estimates body fat using the Navy formula.

The second finding (which will be described more in the measurements tuning post) is that each cm matters. A 2 cm shift across all torso circumferences (bust, waist, hips) moves the computed mass by ~2 kg!

All the above summed together had a big impact on predicting incorrect mass. Once we adjusted density via body-fat estimation, athletic bodies gained up to 1 kg and soft bodies lost up to 2. Small in absolute terms, but it’s the difference between matching the scale and being systematically off for anyone not shaped like the average.

Another thing that had a big impact on mass accuracy: the ancestry feature. For a while, mass MAE refused to drop below 3 kg no matter what we tried. The error distribution looked bimodal which seemed to be suspicious.

Turns out Anny has three race blendshapes (african, asian, caucasian) that subtly affect body proportions. In training we sampled them randomly, but at inference we hardcoded them to a uniform mix — the user hadn’t told us anything about ancestry. So the MLP was trained on one distribution and predicted under another: a 3 kg noise floor we’d built ourselves. The fix is simple: add ancestry to the questionnaire, four categories mapped to fixed blendshape values. Training and inference now use the same numbers for the same label. Mass MAE dropped from ~3 kg to under 0.5. Some odd height errors on a few bodies disappeared too.

The broader lesson (like last time!) is that spending time on the dataset and the evaluation harness paid off more than spending time on the model. A bigger network wouldn’t have caught the bugs. Running the pipeline on real people is what exposed the mass calculation issue, not synthetic eval. The MLP itself with 2 layers and 256 units is boring. The work that mattered was upstream and downstream of it.

Is it the final form?

Definitely not. As I mentioned, people are struggling more than anticipated when choosing the body shape. What’s missing in the current form but should be addressed are long/short arms/legs, which also impact how mass is distributed. SHAPY’s Attributes-to-Shape (A2S) goes further in the same direction — a body described with a whole set of attributes like “muscular”, “pear-shaped”, “long torso”, “broad shoulders”. Plenty of ideas to borrow from there.

A better idea that is going around us is to make it more interactive and exactly features based. Instead of asking “what’s your body shape?”, show a body the user can adjust directly — bust vs hips, arms vs legs, shoulder width — and let them tune what they see. Probably where we’re heading next.

Try it

SizeMe questionnaire page — form inputs on the left, generated 3D body with ISO measurement contours in the middle, measurements panel on the right

Live in the PWA at clad.you/size-aware/size-me — eight questions, 3D body in under a second. Also exposed as a REST API at api.clad.you — questionnaire or photo in, body params + measurements out. Free for now while we work out whether anyone actually wants this; key at clad.you/developers.

This is the second post in our body reconstruction series.