Updates 2022/2023
Grid
Published: 2023-08-21

2023-08-21 400 years of WCG

Things finally seem to be going more smoothly at WCG, and as a result we have now passed 400 years of CPU time for the project. We’ve also just passed one billion points (not very important), and are just shy of 1.5 million workunits done for the project (much more meaningful).

Also in the past week we’ve passed 10k WUs for FAH, and are approaching 2.5B points over there.

DENIS has had a lot of WUs available lately as well, and we have slotted Einstein@Home as our fallback project (i.e. it’s active, but at priority zero, so it should only start running WUs when DENIS and WCG are idle).

2023-06-22 Pausing Milkyway

We’ve paused crunching on Milkyway@Home because it has a tendency to make chiplet CPUs run very hot (some WUs, anyway), and we are entering the hottest part of summer.

I’m sure we’ll resume later, but for now our CPUs are crunching WCG, Einstein, and DENIS.

2023-05-30 General updates

  • We have passed 390 CPU-years for WCG. Before the Krembil takeover, we were ticking over a decade of compute time every 25 days or so. This most recent decade has taken almost a year of calendar time to accrue. I hope things smooth out over there.
  • We are crunching for Einstein@Home again, as WUs from WCG and DENIS continue to be intermittent
  • FAH workrates have stabilized at roughly 10M PPD, with all GPUs running at 110W. The 4070 continues to dominate, and I’m looking forward to getting another one when finances allow
  • node02 has finally gotten its chassis and PSU upgrade

2023-05-03 New GPU numbers

We recently put a new GPU into service, an RTX 4070. Here’s how it stacks up against other recent-generation cards we’ve crunched with.

Folding@Home PPD for the 1650 (non-Super) is a very long-term average. The 3060 Ti numbers are a ~7 day average. 4070 numbers are a 3 day average.

GPU Pwr Temp PPD LoPwr LoTemp LoPPD
GTX 1650 75W N/A 450_000 N/A N/A N/A
RTX 3060 Ti 200W 75-78C 3_440_000 110W 68-70C 2_490_000
RTX 4070 200W 68-73C 7_310_000 110W 55-63C 6_311_000

Ada looks to be incredibly efficient compared to Ampere. Here’s some more numbers, comparing only the 4070 at 200W to itself when power-limited to 110W:

Metric 200W 110W % Work
Folding@Home (PPD) 7_310_000 6_311_000 86%
Folding@Home (WU) 45 35 77%
Geekbench (OCL) 185_980 181_208 97%

FAH is the best indicator, because it’s a mixed set of real-world scientific computing jobs with hours-long runtimes. But that also means that you really need long sample times to even out the “bumpiness” caused by those long jobs. If I had devoted 10 days to each power regime, I expect that the PPD% would have been slightly closer to (or perhaps even just above) 90%, and the WU% would have been in the 80s.

Geekbench on the other hand, is a short-runtime synthetic benchmark which tries to be fair across a range of hardware, rather than attempting to be a high-fidelity representation of actual workloads. Still, I’m shocked out how close the numbers are: a 3% performance delta with a 45% power reduction.

There’s a part of the story that isn’t told by these performance numbers. You might have gotten a hint of it by looking at the temperatures in the first chart, but I’ll lay it out here: the 4070s stock power limit is higher than what the card will actually draw in many circumstances.

The 3060 Ti and the 4070 both have a stock power limit of 200W. When crunching FAH workunits, my 3060s actually draw 195-198W most of the time. The 4070 on the other hand, spends a lot of time in the 155-185W range.

This feels like a good explanation for why thje 4070 performs so well under a (nominally) severe power restriction, as well as why many reviewers focusing on graphics have pointed out that OCing the 4070 gets you almost nothing. The card has more juice than the silicon can use up, straight from the factory.

2023-04-25 Upgrade update

  • node03 has gotten its chassis and PSU upgrade; it had become very flaky in the past 6 weeks or so and I strongly suspected it was the PSU. Those Corsair SFX PSUs have been excellent, but they’ve been running continuously since early 2019
  • Only node02 is left in need of this upgrade now. It will be taken care of soon
  • More GPU compute coming soon as well
  • We’ll be holding there for a while. I’m undecided on when the AM5 upgrades will begin, but it won’t be any sooner than autumn

2023-02-13 Hardware updates

Miscellaneous notes on recent hardware changes:

  • node01 has been upgraded to a Fractal Pop Air chassis and an Evga Supernova 650W 80+G PSU
  • My personal desktop is now crunching, for FAH only
    • This is the only machine we have crunching FAH on CPU
    • By end-of-week it will be upgraded with a 3060 Ti
  • One of the GPUs acquired over the pandemic, which did many hours of crunching for OPNG and then FAH, is being put out to pasture. Its fans have developed an annoying rattle, even at idle, so it’s no longer fit as a spare for future donations. RIP Galax 1650 EX

2023-01-19 Upgrade cycle begun

We are down to three compute nodes now. But we are still planning to increase the number of modern GPUs from one to two, and then possibly to three.

Another change is that our nodes are going to be upgraded to be less like “cattle” and more like “pets”. This is because they’re going to also become local storage nodes in a Ceph cluster. That will happen after the AM4 to AM5 transition, but before that they are getting new chassis and PSUs.

The first of these upgrades has now happened. We’re standardizing on the Fractal Pop Air case. It’s a good airflow case, which is compact but supports full ATX boards. Which is one more change that will be happening with the move to AM5. After starting on Mini-ITX (optimizing for space) and then moving to uATX (optimizing for cost), our nodes will now be built around ATX in order to support more NVMe devices.

This will be a slow process, without a deadline or exact timeline, but updates here will continue at least as major upgrades happen. And as always, we’ll be crunching.

2022-12-30 CPU drawdown

CPU-based grid computing is waning. Wee’re not doing be doomsayers and describe it as “dying”, but we think the combined forces of utility computing and GPGPU have brought us into the era where CPU-based grid compute projects are on the decline.

Certainly the projects operating in the spaces that we care about – medicine and the sciences – are fewer than they were when we started crunching in 2017. And some of the ones that are left have problems of their own.

As a result, team Firepear is reducing its compute farm from six nodes to three, cutting our CPU compute resources in half.

On the other hand, we plan to add (at least) one more GPU of (at least) equivalent performance to our current 3060 Ti. Projects like Folding are still going strong, have good leadership, and have healthy communities.

We don’t plan to go anywhere; we’re gonna keep contributing for as long as someone needs us. But we feel that it’s time to contribute in a different way.

2022-11-02 GPU shuffle

Today we replaced four GTX 1650 GPUs and one GTX 750 Ti with a single RTX 3060 Ti. This new card does almost 2X the work of those five old cards, and does it for roughly 2/3 the energy.

This is a win not just for FAH, but also for the CPU-based BOINC projects that we’re attached to, because it let us bump up CPU wattage (and thus clocks) on 5 out of 6 nodes. It’s even a win for our electriciy bill, because the total system usage is still about 50W lower than it was – even after turning up the CPUs a bit.

2022-10-30 WGC update

The WGC transition has not gone well, in several ways, and seemingly for many reasons. That’s why it’s worth noting that today, for the first time since the shutdown in February, we have updated our WCG stats.

That means that another 10 CPU years of WUs down for Team Firepear. Actually, a good bit more than that, but WGC is only counting WUs from the past 3-ish weeks – none of the “testing period” WUs have been aggregated yet.

Still, we’re very happy to be seeing sustained movement on that front.

2022-08-22 Back at it

WCG is still… testing? in limbo? something.

In the meanwhile, we’ve added DENIS@Home and Milkyway@Home as permanent projects and are crunching on those. Which means that FAH is running again as well.

2022-06-01 Downtime

We’ve shut down crunching for most of the past month, while WCG has been down. We’ll be back when they are.

2022-04-02 More power and less power

Since the last update, all nodes have been upgraded to a 16-core configuration. The 3900Xs, after long and meritorious service, have all been retired. They are replaced with 5950Xs.

Also, summer is now approaching, and the machines are in an un-air-conditioned garage. It is fortunate that the garage is on the side of the house which receives the least amount of sun every day, but still it will get very warm in there. In an effort to continue 24/7 crunching through the summer I will be adjusting PPT limits downward to try to keep things acceptibly cool.

My first adjustment was made today by reducing the PPT of all nodes to 65W – not the AMD Eco Mode “65W”, which is really 88W, but an actual hard cap of 65W package power use. I’ll let this ride for some days (at least until it warms up a bit later in the week) to see what sort of steady-state temperatures and average CPU clocks are reached.

For later comparison, with PPT set to roughly 85W, steady-state temps were around 70-72C most of the time, and pushing north of 75C in the warm afternoons.

Older updates

For older news, check out our 2021 updates.