So: reliability does not start with a nice-looking PCB or wow-features. It starts with an honest conversation about where and how the device will work.
To be completely honest, even we have fallen into this trap at times, despite having a fair amount of development experience. I would like to go through several practical situations we have encountered and use them to show why "minor" and uninteresting details are better discussed as early as possible, together with the developer. From our side, we guarantee a large number of clarifying questions at every stage of development. In essence, what I want to say is that the customer and the contractor are actually one large team developing the device or product.
It is logical to start with what usually seems the most obvious, and therefore is often underestimated: the physical environment around the device.
Sun, Air, and Water Are Our Best Friends
An interesting fact: some boards will keep working even if they are submerged in a container of ordinary water. If clean boards are placed in distilled water, the percentage of working boards will be even higher.
And here is something else that is not always obvious: moisture is dangerous not only as direct water ingress into electronics, but very often as invisible condensation, wet dust, a film of contamination, or damp contacts. Most often, problems are caused by long-term corrosion due to moisture and dirt, especially when the effect is amplified by a galvanic pair in contact areas. Corrosion often causes not a complete failure, but floating symptoms: it works, then it does not. Contacts, connectors, terminals, exposed areas of the board, and sensors are usually the first to suffer. The photo shows a board from one of the projects we participated in, although we were not directly responsible for this part of the development. Water did not get onto the board directly through splashes or flooding; this is the result of repeated condensation.
The second part of the same problem is contamination. Dust and dirt are also very dangerous, especially if somewhere in the workshop there is dust from metal grinding. This is a very common problem, often seen, for example, in the electronics of welding machines. We had an interesting case where a remote climate sensor failed. Later it turned out that over the summer season a spider had spun a web inside it and gone about its dirty business there.
Another case shows that contamination can appear even where you do not really expect it. We were asked to audit a board from a mass spectrometer that had stopped working. The equipment was located in a room with increased air cleanliness. Despite that, specialized compressors had been placed near the spot where the instrument took in air to cool its internal electronics. One of the compressors apparently had a fault, and technical oil was being emitted from somewhere. The amount was so small that it took years before part of the board became covered with a thin oil film, which eventually led to failure.
The general conclusion here is simple: it is not always possible to account for everything, but it is often possible to account for a lot if you discuss not only the device itself, but also the environment around it in advance.
What we account for in development: enclosure, sealing, ventilation, cable glands, protective coatings, board placement, and maintenance procedures. These decisions are better made before the PCB has already been routed, the enclosure has been selected, and the device is almost finished.
"It is only with the heart that one can see rightly; what is essential is invisible to the eye"
After water, dust, and oil, it is tempting to move on to things that cannot be seen with the eyes. Sometimes a device works perfectly, but stops working under certain conditions. I found one spot on my way to the office where wireless headphones stop working; it is related to a nearby relay tower. The connection begins to disappear gradually, digital artifacts appear, and then there is silence. At some distance, the connection comes back.
Another good example from my youth: we were at a small semi-handmade production site in the Moscow region when a downpour started and lightning struck the workshop. The circuit breakers did not trip, but all instruments and computers shut down or rebooted. Production stopped. Only one CNC machine kept working. At the same time, all instruments were connected and grounded. Most likely, that device had a better thought-out response to such a situation.
Interference can come not only from outside. In our practice, we have repeatedly faced these problems while developing power electronic devices. At the very beginning of my engineering work, we measured the temperature of a power-switch heatsink using an I2C sensor, and of course it worked perfectly. Exactly up to 5 kW. After that, the temperature disappeared. I2C is simply not the interface for doing that kind of thing.
We have a lot of experience fighting interference in power equipment development. There were cases where we had to deal with interference in the control signals of power switches. There were cases where, as power increased, interference affected measurement circuits. There were also situations where interference from the power section affected the rendering of the interface screen or data transmission over communication interfaces.
The most unpleasant thing about these problems is that for a long time they often do not look like problems. Very often, up to a certain point, nothing is visible, although the problem already exists. You can understand that something is wrong only through measurements. Sometimes there are genuinely difficult situations where full real testing is hard to perform. In our case, one such development was sensors for 10 kV power lines, installed directly on phase conductors and the ground wire. It was a serious challenge because there was no way to do full real-time debugging during development. In many cases, there are no such limitations, and that opportunity should be used. Testing the device under real operating conditions is an integral part of the work; without it, development cannot be considered complete.
What we account for in development: the power section, measurements, interfaces, and indication cannot be considered separately. Layout, grounding, shielding, isolation, interface choice, and testing under real load all matter.
Between Scylla and Charybdis
A separate case of interference is measuring weak signals. One of our developments, as expected from the very beginning, turned out to be extremely sensitive to external noise. This was the E-NOSE measurement system, where it was necessary to measure the resistance of certain chemical compositions deposited on a substrate that had to be heated to 350-400 °C. Measurements had to be taken on 17 channels with a sampling rate up to 10-20 Hz. At the same time, the heating element itself is located on the same substrate, less than a millimeter away from the measured resistances.
The problem was that these resistances could be on the order of hundreds of kiloohms, or on the order of tens of megaohms. And megaohms are exactly where the problems begin. First, measuring such values without shielding is extremely difficult. Without shielding, the oscilloscope picture does not look very pretty: the instrument picks up everything it can pick up.
The second part of the problem was the power supply for heating. Interference entered the board through it and made its way into the measurement path despite a filtering stage on the board itself. In practice, not every off-the-shelf power supply is suitable for powering this kind of electronics. Another possible problem was PWM regulation of heating to maintain the set temperature, but we took care of that in advance and simply track and exclude measurements during transient processes related to PWM regulation.
In this case, we see several points at once that must be considered when developing a device. Although the device is used in laboratory conditions, it is extremely sensitive to interference: interference radiated by surrounding sources, interference coming through power lines, and interference generated by the device itself.
When the measured signal is weak, the surrounding environment becomes part of the circuit. In this project, we also accounted for special requirements for PCB manufacturing and cleaning, and in certain parts of the board part of the circuit had to be made without solder mask. But that is a separate story.
What we account for in development: for measurement devices, it is important to look not only at the ADC and the sensor, but also at power supplies, shields, cables, the board surface, cleaning technology, and the operating modes of internal interference sources.
"I am a man: Nothing human is alien to me"
Even if the environment, power supply, and interference are accounted for, one more factor remains: the human being. We noticed a long time ago that many people are quite relaxed when interacting with electronics. More than once, I have personally seen 24-36 V devices connected to a 48 V supply, contacts on development boards shorted without hesitation, crooked installation, incorrect component placement, lack of grounding, rough picking at boards, overheating during soldering, open enclosures, and attempts to connect things "however it works."
Absolutely everything, at every stage: from development and testing to production and operation. What is most interesting is that this often happens even in situations directly related to people's own safety. For one customer, we made a high-voltage unit with a fairly serious power level, enough to make an operator stop operating if safety rules were violated. We wrote a detailed manual listing all necessary safety measures and the consequences of not following them. We also instructed the customer personally more than once. At the same time, I regret to say that there are concerns that safety rules may not be followed in full. But these are only guesses.
This is not an accusation; it is operational reality. People need to get a result quickly: start it, replace it, connect it, remove an error. Many have become used to their smartphones not being afraid of cold, water, dirt, impacts, or static electricity. And they expect the same from any laboratory device, development stand, or board.
If the reliability of a device depends on an ideal user, that is not reliability; it is hope. It is not worth designing for every possible misuse scenario and every protection against it, but it is also wrong to forget that a good industrial device should not require gentle treatment of every one of its weak points. That is why it is so important to understand how a person will interact with the device.
What we account for in development: installation, labeling, protection against typical wiring mistakes, access to connectors, status indication, instructions, and diagnostics. The device must be understandable not only to the developer, but also to the person who sees it on site for the first time.
Quantity Turns into Quality
There is one more question that looks organizational, but directly affects technical decisions: how many devices should exist in the end. At the initial stage, it is very important to understand what exactly we are developing: an MVP, a single device, a small batch, or a product aimed at serial production.
Far from every development project needs a series. Sometimes the final need is only one device or a few units. In that case, there is no point in optimizing the device for component cost or production cost; it is more important to reduce development time, for example by using ready-made modules or more expensive components.
Sometimes even a finished instrument is not needed; the task is to test a hypothesis or see how electronics will behave as part of some larger device. That is an MVP, and it has its own approach. And finally, probably the most difficult thing in development is serial production. Here it is not enough to develop the device itself. You also need to think through and prepare production, optimize the device for cost, provide for component substitutions, acceptance and diagnostic procedures, and so on, as well as go through one or more cycles of early device versions.
In some cases, for example, we encountered a situation where the customer ultimately produced about fifteen hundred devices, but each batch consisted of approximately 20-50 units, and this was an entirely reasonable decision dictated by the market. Understanding the production volume is extremely important at the initial stage of development.
What we account for in development: target volume, production method, component availability, acceptance, diagnostics, and repairability. A device made as a one-off and a device made for a series can solve the same task, but they are designed differently.
Conclusions
If we put these examples together, it becomes clear that operating conditions are not one item in a specification, but a set of constraints that change the development. Among other important factors that must be considered are mechanical vibrations, reliability of detachable connections, and the length of communication and power lines.
In this article, we also intentionally do not discuss many issues related to development and design, such as impedance control, firmware and logic testing, thermal management, and so on. Right now, it is important to show that many questions must be solved jointly by us as developers and by the customer, both before development starts and during the process.
Operating conditions are not appendix number 5 to the technical specification. They are input data for the architecture of the entire development. Industrial electronics begins not with the PCB, but with the environment in which that PCB will have to live.
Mini-Checklist: What to Think About If You Need Your Own Device
- Where will the device be installed: factory, street, vehicle, field, laboratory, space?
- Is there humidity, condensation, dust, abrasive material, temperature variation, vibration?
- What power loads and interference sources may be nearby?
- What power supply will exist in reality, not on a laboratory bench supply?
- What should happen during a power dip, sensor break, or communication error?
- Who will install and maintain the device?
- Can the device be tested on site?
- How will the device be produced? Who will do it? What is the batch size?
- Will the device work on its own or as part of other equipment?
- How much does the device operation depend on the actions of an operator or user?