The Domain Expertise Trap

The softwarization of engineering continues apace…

I keep seeing people write things like “a major disadvantage of the technique for X is that it requires substantial domain expertise.” Let’s look at a recent good paper that makes many such remarks and try to understand what it could mean, and if having or getting domain expertise is actually a bad thing. Spoiler: It isn’t.

The paper under the spotlight is The Literature [R174], “Interference Suppression Using Deep Learning: Current Approaches and Open Challenges,” published for the nonce on arxiv.org. I’m not calling this post a “Comments On …” post, because once I extract the (many) quotes about domain expertise, I’m leaving the paper alone. The paper is a good paper and I expect it to be especially useful for current graduate students looking to make a contribution in the technical area where machine learning and RF signal processing overlap. I especially like Figure 1 and the various Tables.

The paper describes the current state of research into the application of machine-learning algorithms to the problem of detecting and removing cochannel interference from various kinds of radio signals. Why use machine learning over signal processing for this kind of task? Some motivations are provided, and one motivation is stated many times: to escape the need for domain expertise. Let’s document the various statements for clarity:

Reading through these extracted quotes, it appears that the authors are saying something like this: “I would be better able to make good technical progress in my chosen field of expertise, interference mitigation, if only I didn’t have the barrier of being an expert in my chosen field of expertise, interference mitigation.” But is that a fair interpretation? Perhaps another interpretation is this one: “I would be better able to make good technical progress in my chosen field of expertise, machine learning, if only I didn’t have the barrier of being an expert in my chosen application area, interference mitigation.”

Let’s investigate the possible meaning here by looking around at what machine learners and software engineers (two fields that highly overlap) say about the term domain expertise. We ponder the question: Is being an expert in the application area a ‘downside’ for a machine-learner software engineer? Is being an expert in the application area a ‘downside’ for an expert in the application area? I’m being charitable here, and it all still seems weird no matter how I twist it around to see something of value. I sometimes refer to this attitude in the graduate-school context as ‘getting a PhD in Tensorflow.’ As in, don’t bother me with the details of some application, I’ve got the universal tool for all applications. Sounds like a trap.

This repeated use of ‘domain expertise’ is softer language (slightly less dismissive) than that I analyzed in my post about handcrafted features (and handcrafting appears in the above quotes too, sigh), but I think the idea is pretty much the same.

One last comment before the essay really starts. I find this kind of language in a technical paper jarring. Can you imagine reading a paper by Viterbi, Forney, or Gardner (or even Hinton if I’m being honest) and seeing them say: “You know, the really hard part about this material I’m putting down here on paper is the knowledge part. If we could get around that, we’d be golden.” So part of this is just a unpleasant reaction to modernity on my part. Kids these days.

What is Domain Expertise or Domain Knowledge?

Wikipedia doesn’t have an entry for ‘domain expertise,’ but when you google that phrase, the Wikipedia entry for ‘domain knowledge’ comes up, and I’ve seen that phrase used in similar contexts (machine-learning papers), and it also is used once or twice in the quotes above. So let’s check that phrase out.

Wikipedia says:

Domain knowledge is knowledge of a specific, specialized discipline or field, in contrast to general (or domain-independent) knowledge. The term is often used in reference to a more general discipline—for example, in describing a software engineer who has general knowledge of computer programming as well as domain knowledge about developing programs for a particular industry. People with domain knowledge are often regarded as specialists or experts in their field.

https://en.wikipedia.org/wiki/Domain_knowledge

If I’m following, software engineers say that knowledge of “computer programming” is “domain-independent” but other kinds of knowledge are “domain knowledge.”

OK, so I’m getting it. From the point of view of a programmer, domain expertise is knowledge about stuff that isn’t programming or software engineering. Programming languages, compilers, git, tensorflow, CUDA, etc. are “general knowledge,” whereas finance, earth science, signal processing, and medicine are domain expertise. So, then, what is being asserted is that it is, in fact, a major drawback for the software engineer to have to know about the application to which their software is applied. Is it? If so, in what sense it is a drawback? Personally? Or does having application-area expertise harm the outcome? Or just stem the flow of people doing the work? And are the authors of the paper in question software engineers? (No.) Is it a drawback to have domain knowledge if you are an expert in the application area but you are merely attempting to find or hone good software tools to wield?

Is it ever a drawback to have solid knowledge of the application area, in which you are applying sophisticated tools? Or is it just hard to do? “The main impediment to solving this problem is all the learning I have to do about the problem.” Well, yeah. We just need a magic problem-solver box. All we need to do is construct the box without significant knowledge of the problem the box will solve. We push all expertise into expertise about Tensorflow. Is that possible? Ultimately is it desirable?

The Parable of the Optimal Remover

Mary and Leo are technicians with a technical problem: removing undesired cochannel interference from a desired radio signal. They have studied the modern software tools for various kinds of machine learning and are quite good with git. So when their boss, Cindy, tells them to create an interference remover, they feel confident in their technical ability to solve any problem with their tools–how hard could it be compared to learning modern programming?

“Hey Leo, I want us to make the ultimate remover, not just something as good as the remover Dan and Eunice have,” said Mary after Cindy left. Dan and Eunice are mathematicians with experience modeling radio waves and are quite good with FFTs. “Of course, me too,” said Leo, “we just need to train up a neural network. It will minimize the error, proving it is the optimal remover. Dan and Eunice will be hanging their heads in no time.” Mary responds with a dismissive shake of her head and a sigh, “They’ll never learn. Let’s get started Leo.”

Mary, being the more organized of the pair, divides up the work. “I’ll define the layers and the hyperparameters–that’s the important thing. You get some labeled data.” And so they started down the path toward the optimal remover.

Mary decides that all she really needs–all she ever needs–is Alexnet, because it works for detecting cat faces in Iphone pictures, and that is a very good thing indeed. She reflects on how hard it is to even imagine what the world was like before you could easily detect cats in images. She loads up the network and prepares for training, testing, and validation.

Meanwhile Leo immediately turns to Google. “I’m sure there is a dataset for training an optimal remover out there somewhere. I mean, heck, it is 2022. Big data was an old idea already ages ago,” he mutters to himself as he works the searches. Finding a couple of datasets on some university sites associated with the word ‘interference,’ he quickly downloads the data and puts it on the team’s fileserver.

Mary and Leo relentlessly train their network, adding and subtracting different kinds of layers, trying various activation functions, hiding layers, unhiding layers and, in short, doing all the usual hyperparameter hand-crafted trial-and-error tasks that make up their state-of-the-art. Eventually they tire and settle on a particular machine, which minimizes the error on the training portion of the dataset, passes validation, and does just fine on the testing portion. Triumphant, they go in seek of Dan and Eunice, prepared to crow about their achievement.

Dan and Eunice are in their lab, but are always able to greet the learners politely and listen to what they have to say. On this day, they say a lot. Or at least they use a lot of words.

“We’ve just built the optimal remover! Only took us half a day, too,” begins Mary.

Leo shows them a printout of the error-vs-epoch curve produced during training. “See you guys? The error starts off high over here,” he says, pointing to the left part of the curve. “And it ends up much lower over here,” gesturing at the right edge of the plot. “So the error is minimized, proving this is the optimal remover. Because, Dan, you can’t have error lower than the minimum.” He finishes with a self-satisfied smirk.

“Great! Really glad for the two of you,” says Eunice. “Just a couple questions because, as you know, Dan and I have to work hard to keep up with you two.”

“What were the signal and the interference in this system design?” asks Dan, leading off the questioning.

Leo quickly replies, having anticipated this all-too-predictable question, “The signal is called ‘desired’ and the interference is ‘undesired.’ The good thing is, this means it is the universal AND the optimal remover!”

“Well, did you try it on all kinds of signals and interferences?” asks Eunice, gently. “I mean, what application areas, or signaling situations did you examine?”

“We don’t need application-area information or datasets. That’s domain expertise, and we’ve built a universal optimal remover.” Mary counters, some annoyance creeping into her voice.

“How did you check the quality of the training and testing datasets?” asks Dan. “Power spectrum analysis, autocorrelation function, attempts at demodulation, cyclostationary analysis, matched-filtering to check for known repeated components, or something else?”

Leo and Mary share one of their knowing looks, which used to be more irritating to Dan and Eunice, but they’ve gotten past it lately. “We didn’t do any of that. We found the dataset on the Machine Learning server of Ivytown University. Why would they put that dataset on the internet if it wasn’t correct and complete?”

“Why indeed,” muttered Eunice under her breath.

“Besides, those signal-processing … whatevers … are the domain of the domain-expertise experts. And that is not us!” proclaims Leo rather too cheerfully.

“Too true, too true,” agreed Dan, eager to move on. “Let me take a quick look at the Readme.txt file that probably came with the dataset from Ivytown. I can see if we have an appropriate signal-processing-based interference mitigator handy to create a comparison.”

“Sure… I guess.” said Leo as he logged into one of Dan’s lab computers. “Here it is.”

An awkward silence fell for a few minutes as Dan read the Readme.txt file, with Eunice looking over his shoulder. “Ah, OK, it says here the signal is 10-MHz LTE and the interferer is a particular pulsed-radar signal with linear FM on the pulses,” said Dan. “We happen to have an interference mitigation algorithm for that problem. I think we told you about it in the past?”

“Yeah, Dan, I figured you’d bring up your statistics-based remover,” said Mary, “so that’s why we decided to compare the optimal remover with your remover. We trained a neural network with fourth-order statistics because you had told us your method used fourth-order moments. So we put in four times the signal and also the signal raised to the fourth power. The resulting neural network was terrible, as expected.”

Dan’s eyebrows had been ratcheting up higher and higher through Mary’s speech, and when she finished they collapsed into a pair of furrows above his hardening eyes. “Mary, our signal-processing method is complicated, and only a part of it relies on fourth-order statistics. In particular, we use the (4, 2) cumulant, not the (4, 0) moment. And throwing some statistics at a neural network is not even close to constructing a version of our method. Not. Even. Close.” This last was ground out between tightly clenched teeth.

Leo decided to defuse the suddenly tense situation, “Dan, Eunice, we know that our comparison between the optimal remover and the traditional hand-crafted labor-of-love totally-old-school signal-processing method is something you think is important, but when we trained a network to use your statistics, the resulting remover has got to be better than your setup. After all, it minimizes the error. So we were trying to be generous to you.”

Leo noticed that Dan and Eunice were sharing a look that reminded him of the look mobsters give each other in the movies just before someone’s eye gets gouged out with a ballpoint pen. He figured he was imagining things but still he rushed to fill the silence with more, and hopefully better, explanations. Before he could, however, Eunice spoke up.

“Alright, let’s get this over with. Did you and Mary find any datasets that are similar to the Ivytown dataset, but that come from an independent research team?” she said.

“No,” said Mary, “we typically don’t do that. See, the way it works in the machine-learning world is that you get a dataset, divide it into training, testing, and validation subsets, and then perform the training and testing. Once that is done, the error is minimized, and you publish the results. We’ve already moved on to another project.”

Dan, glasses off and rubbing his eyes, murmurs, “Oh? And what would that project be?”

“The optimal neural network!”

Is it a Trap?

I think it is. How can we know about the quality of our dataset if we can’t interrogate it? And how can we interrogate it if we don’t know how to study, in detail, various aspects of random processes and their sample paths?

How can we accurately and efficiently compare a machine-learning system with a non-machine-learning signal-processing system if we don’t understand signal processing?

How can we know if one dataset is superior to another in terms of its realism if we don’t know about communication-signal random processes, propagation channels, bad effects of receiver chains, and even details of specific complicated systems of practical interest like LTE?

How can we communicate accurately and efficiently to other researchers in the field (ML or non-ML!) what we have done during the course of an RF ML project if we don’t understand the terminology, notation, and prior work associated with the application area?

What am I missing?

If you leave a comment, try to be civil. Humor is encouraged, sarcasm is welcome, snarkiness is fine, vitriol is forbidden.

Author: Chad Spooner

I'm a signal processing researcher specializing in cyclostationary signal processing (CSP) for communication signals. I hope to use this blog to help others with their cyclo-projects and to learn more about how CSP is being used and extended worldwide.

6 thoughts on “The Domain Expertise Trap”

  1. Great analysis Chad. You also have to consider the reverse way : Do domain experts have the required depth to go into ML and other CS kind of areas, and start making statements there. That also seems fairly preposterous. If I look at the taxonomy of this paper, this is what jumps out to me : A domain expert, (professor in this case) who has been studying communications and related signal processing for many years, took on a few PhD students, and felt fairly confident that his knowledge of statistics and random processes, will let him sail through the area that the new ‘kids’ have invented. He got some funding and setup his PhD students to go and start applying the latest and greatest ML/deep NNs to the problems of the group. So this is the case of a domain expert, telling us, that his domain expertise is not needed given new ML things. It seems more like a case for PhD students to publish papers. Will he leave his expertise aside and now just trust ML ? Ofcourse not. He will continue to have lucrative gigs based on his domain expertise, but might feel propelled to add new ML buzzwords to his skill set now.

    1. As a domain expert that has had forays into machine learning, I’d say no, I don’t yet have the depth to really do a good job applying ML software tools to my domain problems. We need each other! I think compiler designers need application coders to really flush out the subtle bugs in the compiler code and to point the way toward new features and extensions. And application coders need compiler designers because trying to construct your own compiler is a daunting task. Same with ML and signal processing. We need each other.

      I think the problem started about five years ago when one side said the era of the other side is over. And then proceeded to justify and document that claim very poorly indeed.

  2. But, Chad, what if there was an optimal universal remover? With your attitude, we’d never find it. Isn’t it important to keep research avenues open that could lead to major breakthroughs? And even if we aim too high, maybe other, unexpected, good things will be discovered or invented along the way.

    I suppose it would be useful if we had a mathematical proof or solid argument that there couldn’t be an optimal universal remover. One way to do that would be to show that there is an optimal remover for Problem A and another, incompatible optimal remover for Problem B. In the meantime, Leo and I are going to keep trying!

  3. Well, Chad, I favor the Tale of the Master Carpenter. A Master Carpenter has to know how to use a wide variety of tools, some very complex and dangerous, like miter saws and drill presses and lathes. But complete mastery of the toolset is not enough to make it to a Master Carpenter designation. You need to know about wood. All kinds of wood, how it behaves when all the different tools are applied, how people like it when it is used for a formal table, a chair, the frame of a house, a model airplane, etc.

    I think the Master Carpenter also needs to understand human nature and, to some degree, aesthetics. What kind of wood and finish and weight would be good for a family dinner table? What if that table is also used for poker night? What if that table is sitting in the formal dining room and is only used once a year?

    So the Master Carpenter needs to integrate the tools (tensorflow, gcc), the materials (RF signals), and the know-how (what makes a highly functional result, what makes an efficient but-just-OK result, what makes a fantastic result). If he neglects any one area, he is not really a Master Carpenter. The Domain Expertise trap for the Master Carpenter is to say that he just needs to master the miter saw–all wood types and all the end-products are just getting in the way.

    Does it fit? Anyway, Eunice and I are still on speaking terms with Mary and Leo, never fear. We help each other when we can, to the best of our abilities and to within the confines of our natures.

    Enough philosophizing!

  4. I think you’re time would be better spent creating and posting voluminous datasets for our training and testing. Don’t give up your day job.

  5. I think a big step forward would consist of forcing neural networks to ‘learn’ features that we know already are optimal for some problem. Can we do that in a wide variety of problems for which those known optimal solutions exist? Then we could see if we could force the network to do an even better job inferring things using those features than we do when we derive or dream up an algorithm that uses the features.

    For example, why don’t trained neural networks come up with things like the spectral correlation function or cyclic cumulant magnitudes when trained on simple ‘toy’ modulation recognition problems? We know that they don’t because once we slightly shift the input data random variables, the performance tanks.

    Is it because we have taken ‘too much’ from the convolutional neural networks that have had so much success operating on images? Maybe we need neural network layers that are not convolutions, that are more appropriate for RF signals, such as a squaring layer?

    Mary and Leo aren’t as bad as they come off in your parable, Chad. Maybe you should stop by and meet them in person?

Leave a Comment, Ask a Question, or Point out an Error

%d bloggers like this: