SmartUPS hot parts: the resolution

About a year after posting my original page on the SmartUPS open-fuse repair and related observations on overheating parts, I happened to run into someone who works for Schneider Electric at a local event and got to chatting a little. I mentioned how I believed I'd found a design defect in most of the product line of the relevant vintage, and she seemed rather interested. We exchanged contact info and I pointed her at the webpage, and she said she'd forward it to the right people around the office to see if anyone wanted to pursue it further.

Where my prior attempts to raise anyone's attention within the company had been entirely futile, this seemed like a promising next step.

Effect of hot parts
A little while later I got email from none other than one of the original *founders* of APC, who was still with Schneider after the acquisition in a sort of advisory role -- helping with bits of research and keeping various group memory alive where needed. He seemed *very* curious about the UPS issue and about me personally, as he evidently waded through and enjoyed many of the other engineering-type webpages I maintain. He invited me to come up and visit their Billerica R&D office, which I didn't even know was there as I thought they were still based entirely in Providence RI. This sounded completely awesome, and after my previous efforts and snarking in APC/Schneider's general direction I was somewhat astounded he wasn't just mad at me for spewing crap on the internet.

It turned into a fun story, although it gets *very* circuit-geeky in parts -- as it indeed must to reach true understanding.

[Most small images are linked to larger versions.]


Visiting Schneider Electric, Billerica Acting on his generosity had to wait a couple of months for our schedules to mesh, but finally one dreary February morning I made my way through the mixed ice/snow weather to the modest office park that Schneider calls one of its homes.

He welcomed me in and showed me his office, which houses an almost museum-like collection of parts and prototypes from almost every generation of product they've built. He had been otherwise occupying his morning testing magnetization curves on a bunch of current-transformers from a supplier in India or someplace, discovering unacceptable levels of inconsistency across the batch. We immediately got to commiserating about offshore quality control and the constant battle between engineering precision versus beancounters trying to shave a few pennies here and there. I looked closely at one of his samples and made the observation that the physical winding layout over the core shape might lead to damaged enamel insulation between some laps. We nattered about our respective pasts and different eras of the MIT community and various other things, and then he grabbed his folder of schematics and we went to meet a couple of his engineering colleagues.

I had also brought along my own RM2200 in all its hundred-pound glory, just in case they didn't have any live examples handy of UPSes exhibiting the problem and its layout was probably easier for access to the control board than other models. We wheeled it from the parking lot into one of their lab bays, wiped the snow off, and pulled the cover off to start it charging before wandering off to find the infrared imager he knew they had around the place somewhere. Along the way he introduced me to several more of his associates who he thought might also be interested in the discussion, which became a major geekfest about MOSFET drivers and switching speeds and Prius motors and all kinds of fun inverter lore.


Taking IR imager to the 2200 board We finally found the imager and popped back into the lab to shoot the control-board area in question, where the same problematic pair of resistors had clearly become nice and toasty.

IR image of hot parts This was a higher-end IR imager than your typical energy-rater toys and had a real focusable lens, allowing a fairly sharp shot of the board area up close -- enough to discern the shapes of the TO-92 transistor cases on either side of R38 glowing away in the middle. While the temp reading shown at the crosshairs was probably not quite accurate because someone had monkeyed with the emissivity setting, the contrast along the scale clearly showed something amiss.

  The magic moment

Like any good tech company, the office had plenty of whiteboards scattered about so we availed ourselves of one and conceptually sketched up the high-side driver circuit. He confirmed that battery charging is done through the inverter H-bridge by a little clever pulsed switching of the rack's low side, using inductive kick from the transformer to produce enough voltage to push current into the battery. The power supply for that comes from the wall driving the transformer's high-voltage side, but at a 4:1 winding ratio or so the native low-voltage side peak falls a bit shy of the battery rail voltage so the system won't charge without using the flyback trick. Again, similar to the self-boosting topology used in the Prius, and it really does mean momentarily shorting the transformer winding between its two low-side connections just long enough to build a little current and then releasing that as a high-voltage pulse. Back in the day they were pretty proud of this topology as it was elegant and efficient with a low parts count, not to mention providing an easy gateway to producing pure sinewave output on battery. Nowadays almost every power supply we trip over uses high-frequency switching in some fashion, so it's much more commonplace.

I had to temporarily adjust my view of current flow, as he uses conventional current notation ["positive" flow] and I think of it in terms of electron flow. But that didn't get in the way of us being totally on the same page about how any of this worked, and while I'll probably never have the commeasurate engineering chops it felt awesome to be able to discuss things at this level with someone of his stature and experience.

Whiteboard doodle of design problem Then he paused for a moment and stared at our little creation, and said the equivalent of "well whaddaya know, there *is* a current path back through here" as he drew in the swooping arrow I've highlighted a little in red. Whammo, right through the base-collector junction of that lower transistor.

So I was right all along. It was a design issue behind the overheating resistors.

And nobody had noticed for about sixteen years.

With an AC waveform peaking around 50 volts on one side and a solid ground connection on the other, it was clear that R38 and its buddy R43 a short distance away were not going to have a happy day. We ballparked through a little power math and he came up with that on average we're trying to dissipate at least a watt through a 1/4-watt part. While that doesn't make anything blow up right away, it's a heating situation so marginal that physical orientation of the board affects what will happen long-term, which is why so many units are still out there functioning despite this whereas some, like my 1400XL with the control board hung upside down, had failed.


Let's take a closer look at the official high-side driver circuit, one of two equivalent sections controlling the H-bridge. The bus labeled "transformers" is not a ground, it goes to the low-voltage side of the series-connected transformers and on each AC half-cycle, rises above ground in something approximating a sinewave [with switching/flyback peaks superimposed] reaching close to 55 volts as the battery charges. The high-side legs are never turned on during charging mode, as all the boost magic happens in the low-side switching, so Q31 just stays on.

Also understand that each of the power MOSFETs, Q16, Q15, Q14, ... contains a built-in body diode pointing upward as the whiteboard doodle shows. Those make the devices bidirectional, only switching one way, and are what allow charging current to eventually reach the battery rail.

Driver circuit analysis

I already had the right suspicion when I wrote up my post on APC's forum a year prior:

	The path by which the high voltage reaches R38 in the first place seems
	to be via forward bias between the collector and base of Q29 -- I can't
	see any other way current would flow when the XFMR1 lead is high.
	You aren't supposed to do that to a transistor, last I thought...
although I overestimated the wattage by double since I forgot that it only happens on half the AC waveform on each high-side. The "bootstrapped" bit of circuitry downstream of D13 gets pushed strongly positive on each rising half-cycle by riding on top of C34, which incidentally puts a bit of a load on R39 as well but not really outside of what it can handle. The forward flow between pins 2 and 3 of Q29 is the resistor-killer. The other interesting aspect is that the extreme heating only happens on the 48V units -- with the very common 24V based models, (V^2 / R) assures us that if you halve the applied voltage, you get a quarter of the power so a quarter-watt resistor would still run a little warm but have a much better time of it.

We discussed possible fixes. I hesitantly proposed adding a diode on the collector of Q29 to only let current flow the correct way, but we agreed that raising the MOSFET turn-off gate voltage an extra diode drop above their sources might carry risk of not turning them off fully or fast enough. How about a Schottky? Lower forward drop, but hard to find in sufficient PRV to withstand the needed voltage. Raising R38's resistance might start putting the pre-driver transistors into their linear region or make them switch more slowly, and the last thing you ever want to happen is have the top and bottom of your H-bridge pair on at the same time. I muttered something about peaker caps across a larger resistor or some fancier buffer circuit feeding Q29 but then we were getting into complex timing characteristics and stray capacitance concerns and it just wasn't really worth weenieing that hard over a circuit that very few UPS owners were likely to really dig in and fix on their own anyway.

The obvious and simplest remediation would be to accept the otherwise harmless leakage path and replace the tiny resistors with 2-watt or better parts. Besides, there's another fun subtlety in play: base-collector leakage through *both* of the emitter-coupled devices together acts like a reverse diode around C34, protecting it against discharging completely to ground at points in the cycle that could otherwise result in extreme reverse voltage applied.

I guess engineers don't always think about all possible paths through a transistor, but I do as it's one way I test salvaged parts to quickly determine if they're PNP or NPN without having to look them up. Transistors aren't just little switches, they're more like two diode junctions back to back and there's little to help them stop stray currents if they're biased wrong. In the later generations of UPS none of this would matter, as APC rapidly moved to purpose-designed IC high-side FET driver chips with their own charge pumps and signal isolation so these quirks of old discrete-designed circuits would never be an issue anymore.


  Nerd nirvana

We decided to give our aching brains a rest, and he he led me around on a grand tour of the facilities where they're developing new versions of everything from the hundred-watt battery-backed plug strip box under an office desk to the latest 10 KVA commercial units. Their most expensive pride-n-joy piece of new test equipment is a box that can produce several kilowatts of AC power at arbitrary frequencies with completely programmable distortion and faults, the next step up from when they used to do that with motor/generator sets and triac banks. They still had a couple of the 50-Hz rotary converters sitting around in the back of the shipping room. And there were the obligatory test racks where they were beating the crap out of various inferior Chinese batteries to figure out which ones suck the least and might be worthy enough to rebrand as their own and install in production units.

He introduced me to yet more of his design-level cronies and seemed to take quite a bit of amusement describing to them what some random guy off the street had figured out, and in general it was one of the geekiest mornings I'd spent in a *long* time and a nice insight into some of APC's history. He survived the acquisition but while he clearly has a certain amount of that "old guard" cynicism about how things are run now, it hasn't been enough to make him jump ship and fully retire. That's real dedication to the true purpose of one's life work, which is rather nice to see.


Waveform across R38 I got back home and a while later decided to differential-scope directly across R38 to see just what it was enduring, finding that the waveform almost exactly identical to what I'd seen in the original 1400 diagnosis but possibly with a small bit of ground hop above Q31. Or maybe it's drift in the scope amps, not sure. With 10x probes it's clear from the knobs that we see peaks on the order of 48V so even though it's a fairly complex waveform there's clearly a lot of average power under that curve.

  Okay, now what...

Meanwhile my newfound correspondent was making some inquiries among his other colleagues, and they generally seemed to agree that the simplest fix would be to simply beef up the two resistors in question with higher-wattage parts. So I figured as time permitted, I would start doing that to my own units. This wouldn't be trivial as the original resistors go into rather small holes in the circuit board and of course getting access to both sides of these boards isn't exactly easy. The 2200 with its components-up board mounting position was in less need of remediation than other units, but I eventually decided to test the fix feasibility on it as it wasn't powering important downstream equipment at the time and has one of the more open physical layouts.


Extending leads on big resistor Scratching around in the junkbox turned up fairly hefty 1K resistors with what felt like ceramic envelopes, whose leads would obviously need to be adapted down in size to fit through the board vias. They could stand up on short pieces of sleeving. Here I was counting on them to be able to dissipate enough heat at the body to not soften the solder connections.

Bigger 1K resistors in place A bit of careful solder-wick work managed to extract the old resistors without destroying the surrounding traces, and my replacements went in pretty easily.

I did consider the possibility that these resistors might be wirewound, and thus introduce a small amount of inductance into the paths. But at the fairly modest switching frequencies here I didn't expect or encounter any trouble. I probably would have wanted to use something else if it was an RF application.


Still running hot, but not as bad With the unit successfully back together and powered up, the new resistors were still running on the warm side but not nearly as bad and staying well up away from other nearby components. I had to shift a cooling fan mounted above them to the other side of its bracket to allow more room, but that was straightforward and it still fires plenty of air into the inverter heatsinks.

 

[Yeah, I finally broke down and bought myself a cheap [?!] IR imager. It's fixed-focus and can't resolve at a close distance, but it's certainly enough to spot hot electric components. I mostly got it to chase down home energy issues, both in mine after the big retrofit and those of friends who wanted to find the heat leaks.]


  Community service

So this is possibly the page I would have wanted to find on the net when I first started chasing this issue down, and the intent here is to acknowledge that there is a genuine problem and describe how to fix it for those so inclined without placing any real responsibility onto APC / Schneider. These units are "obsolete" and not really supported anymore, so it rests on the user community to detail and effect these fixes. Anyone digging into UPS guts is entirely on their own as far as safety around high voltage/current sources and technical ability to make board-level electronic repairs. Batteries must be disconnected *and* all filter capacitors fully discharged before doing any of this work, or the likely result is a very heavy nonfunctional brick. So thus you've been warned, and happy soldering!


_H*   130326