Ryzen 3900X
Grid Hardware
Published: 2019-07-12

The Firepear computing stack is currently all Ryzen. It’s simple cost-benefit analysis: I’m interested in crunching as much data as possible per unit time, at a reasonable cost (in both money and electricity). Right now, that’s Ryzen.

I’m upgrading all my machines to the new R9 3900X CPU. This provides an opportunity to compare all three generations of the Ryzen family.

## 3900X vs 3950X

I want to throw as many cores as I can at the problems that I volunteer compute time for, but I also have a finite budget. The 3950X will be beyond my reach at launch, so my strategy is to upgrade to 3900X starting in early July.

Extrapolating from the price history of the 1X00 and 2X00 processors, between 6 months and a year after release, prices on the 3X00 CPUs will start to slide downward. When the 3950X reaches a price that works for me, I’ll do a second round of upgrades – assuming, of course, that the pattern holds!

## Ryzen by the numbers

Here’s how every Ryzen I’ve ever had stacks up, by specs and metrics. Temperature data for 3X00 is not yet available, due to lack of initial system support.

Zen 1600 12 3.2GHz 3.40GHz N/A 1.00x 65W 74.0W 57.9
Zen 1700 16 3.0GHz 3.20GHz N/A 1.25x 65W No data No data
Zen+ 2700 16 3.2GHz 3.30GHZ +3% 1.33x 65W 80.0W 64.2
Zen+ 2700L 16 3.2GHz 3.20GHZ +3% 1.29x 65W 67.5W 57.2
Zen 2 3900X 24 3.8GHz 4.01GHz +18% 2.78x 105W 156.5W
Zen 2 3900L 24 3.8GHz 3.55GHz +18% 2.46x 105W 78.0W
2. Scale is how one CPU compares to another, with the 1600 being defined as the baseline (1.00)
• Formula: 1 * LoadClkΔ% * IPCΔ% * ThreadΔ%
3. LoadP is the observed power usage under 100% load, minus the system idle load of 24.5W
4. Tdie temps were measured by lm-sensors, at 100% load, with stock coolers
• The notional 2700L model represents values obtained by clocking the 2700 to 3200MHz and setting Vcore offset to -100mV
• The notional 3900L model represents values obtained by underclocking the 3900X for lower power usage (details below)
• There are no metrics for the 1700 because I no longer have any of them

All CPUs were tested using ASRock B450 mini-ITX boards, NVMe SSDs, and 450W STX PSUs. There are memory speed variances between the machines: initially there was a 2400MHz/2666MHz split, later the split was 3000MHZ/3200MHz.

## Underclocking the 3900X

NB: I am leaving this section intact for historical purposes, but it was written before Zen 2’s clock stretching issues were known. For current advice and accurate numbers, please see this update

Stock 4.01GHz 156.5W
Test 1 3.80GHz 96.5W -38.3%
Test 2 3.60GHz 91.5W -41.5%
Test 3 3.40GHz 87.9W -43.8%
Test 4 3.20GHz 83.5W -46.6%
Test 5 3.00GHz 80.5W -48.6%
Test 6 2.72GHz 76.0W -51.4%
1. LoadP is the observed power usage under 100% load, minus the system idle load of 24.5W
2. Tdie temps were measured by lm-sensors, at 100% load, with the stock cooler
• Test 1 (and all following tests) turnsoff AMD CBS and changes the Vcore offset to -100mV
• Test 4 is where the 3900X is running at the same clockspeed as the 2700L. Equal clocks, but the 3900X still has 15% IPC uplift
• Test 6 is where the 3900X should be doing the same amount of work per clock as the 2700L. 15% lower clock results in equivalent performance.

## Undervolting the 3900X

NB: I am leaving this section intact for historical purposes, but it was written before Zen 2’s clock stretching issues were known. For current advice and accurate numbers, please see this update

My goal is to find the intersection of high performance and low power. My initial goal, with respect to power, was to meet or beat 80W (the draw of a 2700 at 3.3GHz). At stock voltages, that would require dropping below 3GHz, which was throwing away too much performance. Here’s the path I took to defining the 3700L configuration.

Vcore offset was -100mV. All tests were performed with a full load of WCG workunits running.

3.20GHz 1.10000V 83.5W Stable; > 80W
3.40GHz 1.05000V 82.5W Stable; > 80W
3.60GHz 1.00000V Crash on boot
3.60GHz 1.02500V 82.0W Stable; > 80W
3.60GHz 1.00625V Crash on boot
3.60GHz 1.01250V 76.5W OS stable; processes segfault
3.55GHz 1.01875V 78.0W OK; 3900L config

## World Community Grid: WUs in 24h

Now it’s time to look at how the CPUs compare when running actual scientific computing workloads.

These tables shows the number of workunits crunched in 24 hours for each processor type, for several World Community Grid subprojects. During these periods, the CPUs were only working on one type of WU.

These are the WU counts for 24 hours, with SMT disabled. There are no non-SMT counts for the 3900X/L because – spoiler alert – there was no point in spending three days gathering that data. SMT is good for all these applications.

There is information on the performance of the 3900 with varying numbers of threads in the Stockfish and OpenFOAM sections of this document.

Subproject 1600:6 2700:8 2700L:8
WCG OpenZika (AutoDock Vina) 195 237 254
WCG FAH2 (IMPACT/BEDAM/ASyncRe) 46 58 56
WCG Mapping Cancer Markers 69 88 87

Note: I do not have an explanation for the 2700 underperforming the 2700L when crunching Zika workunits in this instance. It had a slight performance lead in all other head-to-head tests, including Zika with all threads enabled. I double-checked the logs: that machine crunched 237 Zika WUs and zero other WUs in 24 hours.

These are the counts for 24h in the stock configuration (SMT enabled; SMT uplift in parens). There is no OpenZika data for 3X00 CPUs yet because there are currently no WUs available. 3900X data will be available later.

Subproject 1600 2700 2700L 3900X 3900L
WCG OpenZika 233 (1.19x) 308 (1.30x) 294 (1.16x)
WCG FAH2 60 (1.30x) 80 (1.38x) 80 (1.43x) 139
WCG MCM1 86 (1.25x) 110 (1.25x) 110 (1.26x) 195

## World Community Grid: WU timings

There is no OpenZika data for 3X00 CPUs yet because there are currently no WUs available. 3900X testing will be done later.

### WCG OpenZika WU timings: min/max and by quintile

CPU Min Max Avg
1600 00h 30min 22s 01h 44min 27s 01h 13min 36s
2700 00h 32min 54s 02h 40min 41s 01h 14min 40s
2700L 00h 27min 07s 03h 05min 24s 01h 18min 37s
3900X
3900L

OpenZika is problematic for benchmarking, as not all Zika WUs contain similar amounts of work. Therefore, in addition to min/max numbers, here is a table of WU runtimes, bucketed by quintile.

Quintiles are not equal across models, but split the span from minimum to maximum time for each CPU. This is bad for pure statistics, but 1:1 comparisons are impossible here due to the many variables in play. Treat these as simply more datapoints about each CPU’s performance under real-world conditions.

CPU 1st 2nd 3rd 4th 5th
1600 4 (01.7%) 9 (03.8%) 111 (46.8%) 108 (45.6%) 5 (02.1%)
2700 7 (02.4%) 289 (92.6%) 15 (04.7%) 0 (00.0%) 1 (00.3%)
2700L 8 (02.7%) 273 (92.9%) 2 (00.7%) 2 (00.7%) 9 (03.1%)
3900X
3900L

### WCG Fight Aids @ Home WU timings

CPU Min Max Avg
1600 04h 21min 10s 04h 41min 28s 04h 33min 12s
2700 04h 23min 45s 04h 49min 35s 04h 37min 00s
2700L 04h 33min 19s 04h 54min 14s 04h 45min 32s
3900X
3900L 03h 29min 35s 05h 15min 49s 04h 08min 00s

Despite having a an 8% higher maximum runtime, the 3900L is 10% faster on average.

### WCG Mapping Cancer Markers WU timings

CPU Min Max Avg
1600 03h 02min 22s 03h 29min 01s 03h 21min 46s
2700 03h 03min 25s 03h 35min 13s 03h 25min 36s
2700L 03h 01min 06s 03h 35min 07s 03h 26min 04s
3900X
3900L 02h 00min 59s 04h 23min 36s 02h 56min 30s

Again we see the 3900L with a higher maximum runtime but a lower minimum and average. This time it’s 14% faster on average.

### Microbiome Immunity Project Workunit timings

MIP1 uses the Rosetta suite of molecular dynamics tools, which wants 4MB of L3 cache per running thread. Even the 3900, with its 70MB of L3, would be oversubscribed if 24 instances of MIP1 were running. MIP1 performance suffers when L3 misses are common and the working data needs to be reloaded from system memory.

Therefore, instead of running only MIP WUs on each CPU, this chart shows the times for MIP1 WUs which ran concurrently with other types of workunits over a period of 144h. The low end was (probably) when only 1 or 2 MIP1 WUs were running concurrently. The upper end was (probably) when more were in-flight.

This is also why there is no count of MIP1 WUs completed in a 24h window; the CPUs were crunching an effectively random mix of subproject WUs during the sample period, so counts would be meaningless.

CPU Min Max Avg
1600 00h 26min 04s 03h 48min 53s 01h 43min 07s
2700 00h 28min 47s 04h 22min 55s 02h 07min 37s
2700L 00h 28min 54s 04h 06min 25s 02h 07min 32s
3900X
3900L 00h 30min 08s 01h 54min 43s 01h 11min 46s
CPU 1st 2nd 3rd 4th 5th
1600 29 (13.2%) 94 (42.7%) 73 (33.2%) 21 (09.5%) 3 (01.4%)
2700 43 (09.4%) 172 (37.6%) 179 (39.0%) 47 (10.3%) 16 (03.7%)
2700L 16 (06.5%) 16 (06.5%) 88 (35.9%) 45 (18.4%) 9 (03.7%)
3900X
3900L 24 (08.8%) 66 (24.3%) 101 (37.1%) 57 (21.0%) 24 (08.8%)

Here the enormous L3 cache of the 3900 comes into play. Its average WU runtimes are half that of the other CPUs, and its maximum runtime is close to three times faster.

## Stockfish chess engine

These numbers were generated by the version of Stockfish used widely for benchmarking CPUs, which you can find here. The command used was:

./asmFishL_2017-05-22_popcnt bench 1024 [THREADS] 26

1600:6 12_522_525
1600:12 17_424_915 1.00x 1.39x
2700:8 16_675_725
2700:16 23_800_828 1.37x 1.43x
2700L:8 15_583_508
2700L:16 22_767_312 1.31x 1.46x
3900X:12 30_436_136
3900X:24 43_169_392 2.47x 1.42x
3900L:12 27_270_950
3900L:24 39_009_503 2.24x 1.43x
• Scale is performance in multiples of the 1600:12
• SMT Uplift is all-threads performance of a CPU in multiples of its own all-cores performance

During the 24 thread test, the 3900X was showing sustained boosts of 4.01GHz, with a power draw of 155.5W.

Note that the real-world differential between the 1600 and the 3900X, in this test, was 2.47X. The by-the-numbers theoretical differential was 2.78X. That’s pretty close to on-paper performance.

## OpenFOAM CFD

These numbers were generated with the current Docker image (of_v1906) of the OpenFOAM computational fluid dynamics package, using the motorBike simpleFOAM tutorial, which is the standard for benchmarking. The algorithm was set to scotch, end time set to 100, and the tutorial’s Allrun script was used for execution. The command used to extract the performance data was:

grep Execution log.simpleFoam | tail -n 1 | cut -d " " -f 3

All CPUs were tested with 2 and 4 processes, to provide a baseline for comparison. Then each was tested with processes equal to its actual number of cores and threads, to show maximum performance.

CPU Time:2 Time:4 Time:6 Time:8 Time:12 Time:16 Time:24
1600 85.80 51.21 41.84 40.42
2700L 93.37 54.56 42.08 43.88
3900X 68.86 34.89 23.86 24.00
3900L 75.93 39.32 23.37 24.10

Incredible numbers from the 3900: 1.7X the performance of the 1600 at 12 processes. And again we see an isolated case where an underclocked/undervolted Ryzen (barely) outperforms one running at stock configuration.

There is very little ILU parallelism in this test, but I suppose that makes sense given the nature of fluid dynamics simulations. Jumping from all-cores to all-threads resulted in a tiny (3.5%) speedup on the 1600, and tiny regressions on the 2700 and 3900X (4.2% and 1%, respectively).

This test was also a good chance to get a look at frequency response and power usage in the 3900X under varying amounts of load.

2 4.50GHz 2.2Ghz 49.5W
4 4.20GHz 2.1GHz 55.5W
12 4.15GHz 105.5W
24 4.00GHz 117.5W
• IdleFreq is the clock on unloaded cores
• Power was sampled during the actual simpleFOAM run portion of the test, which was the most demanding. The 24.5W system idle power has been subtracted from this value.

## Conclusions

### The 3900X is…

• A monster that can churn through incredible amounts of work

### You can save power without sacrificing much performance

I was worried that the performance impacts would be severe, but you’ve seen the results with real-world testing. For the 2700/2700L:

• Performance delta between 0% and 5%
• Temperatures 11% cooler
• Using 16% less power

And for the 3900X/3900L it’s even better:

• Performance delta between 0% and 11% (about 9% on average)
• Temperatures ??% cooler
• Using 50% less power