I keep seeing people write things like “a major disadvantage of the technique for X is that it requires substantial domain expertise.” Let’s look at a recent good paper that makes many such remarks and try to understand what it could mean, and if having or getting domain expertise is actually a bad thing. Spoiler: It isn’t.
The paper under the spotlight is The Literature [R174], “Interference Suppression Using Deep Learning: Current Approaches and Open Challenges,” published for the nonce on arxiv.org. I’m not calling this post a “Comments On …” post, because once I extract the (many) quotes about domain expertise, I’m leaving the paper alone. The paper is a good paper and I expect it to be especially useful for current graduate students looking to make a contribution in the technical area where machine learning and RF signal processing overlap. I especially like Figure 1 and the various Tables.
The paper describes the current state of research into the application of machine-learning algorithms to the problem of detecting and removing cochannel interference from various kinds of radio signals. Why use machine learning over signal processing for this kind of task? Some motivations are provided, and one motivation is stated many times: to escape the need for domain expertise. Let’s document the various statements for clarity:
Reading through these extracted quotes, it appears that the authors are saying something like this: “I would be better able to make good technical progress in my chosen field of expertise, interference mitigation, if only I didn’t have the barrier of being an expert in my chosen field of expertise, interference mitigation.” But is that a fair interpretation? Perhaps another interpretation is this one: “I would be better able to make good technical progress in my chosen field of expertise, machine learning, if only I didn’t have the barrier of being an expert in my chosen application area, interference mitigation.”
Let’s investigate the possible meaning here by looking around at what machine learners and software engineers (two fields that highly overlap) say about the term domain expertise. We ponder the question: Is being an expert in the application area a ‘downside’ for a machine-learner software engineer? Is being an expert in the application area a ‘downside’ for an expert in the application area? I’m being charitable here, and it all still seems weird no matter how I twist it around to see something of value. I sometimes refer to this attitude in the graduate-school context as ‘getting a PhD in Tensorflow.’ As in, don’t bother me with the details of some application, I’ve got the universal tool for all applications. Sounds like a trap.
This repeated use of ‘domain expertise’ is softer language (slightly less dismissive) than that I analyzed in my post about handcrafted features (and handcrafting appears in the above quotes too, sigh), but I think the idea is pretty much the same.
One last comment before the essay really starts. I find this kind of language in a technical paper jarring. Can you imagine reading a paper by Viterbi, Forney, or Gardner (or even Hinton if I’m being honest) and seeing them say: “You know, the really hard part about this material I’m putting down here on paper is the knowledge part. If we could get around that, we’d be golden.” So part of this is just a unpleasant reaction to modernity on my part. Kids these days.
What is Domain Expertise or Domain Knowledge?
Wikipedia doesn’t have an entry for ‘domain expertise,’ but when you google that phrase, the Wikipedia entry for ‘domain knowledge’ comes up, and I’ve seen that phrase used in similar contexts (machine-learning papers), and it also is used once or twice in the quotes above. So let’s check that phrase out.
Domain knowledge is knowledge of a specific, specialized discipline or field, in contrast to general (or domain-independent) knowledge. The term is often used in reference to a more general discipline—for example, in describing a software engineer who has general knowledge of computer programming as well as domain knowledge about developing programs for a particular industry. People with domain knowledge are often regarded as specialists or experts in their field.https://en.wikipedia.org/wiki/Domain_knowledge
If I’m following, software engineers say that knowledge of “computer programming” is “domain-independent” but other kinds of knowledge are “domain knowledge.”
OK, so I’m getting it. From the point of view of a programmer, domain expertise is knowledge about stuff that isn’t programming or software engineering. Programming languages, compilers, git, tensorflow, CUDA, etc. are “general knowledge,” whereas finance, earth science, signal processing, and medicine are domain expertise. So, then, what is being asserted is that it is, in fact, a major drawback for the software engineer to have to know about the application to which their software is applied. Is it? If so, in what sense it is a drawback? Personally? Or does having application-area expertise harm the outcome? Or just stem the flow of people doing the work? And are the authors of the paper in question software engineers? (No.) Is it a drawback to have domain knowledge if you are an expert in the application area but you are merely attempting to find or hone good software tools to wield?
Is it ever a drawback to have solid knowledge of the application area, in which you are applying sophisticated tools? Or is it just hard to do? “The main impediment to solving this problem is all the learning I have to do about the problem.” Well, yeah. We just need a magic problem-solver box. All we need to do is construct the box without significant knowledge of the problem the box will solve. We push all expertise into expertise about Tensorflow. Is that possible? Ultimately is it desirable?
The Parable of the Optimal Remover
Mary and Leo are technicians with a technical problem: removing undesired cochannel interference from a desired radio signal. They have studied the modern software tools for various kinds of machine learning and are quite good with git. So when their boss, Cindy, tells them to create an interference remover, they feel confident in their technical ability to solve any problem with their tools–how hard could it be compared to learning modern programming?
“Hey Leo, I want us to make the ultimate remover, not just something as good as the remover Dan and Eunice have,” said Mary after Cindy left. Dan and Eunice are mathematicians with experience modeling radio waves and are quite good with FFTs. “Of course, me too,” said Leo, “we just need to train up a neural network. It will minimize the error, proving it is the optimal remover. Dan and Eunice will be hanging their heads in no time.” Mary responds with a dismissive shake of her head and a sigh, “They’ll never learn. Let’s get started Leo.”
Mary, being the more organized of the pair, divides up the work. “I’ll define the layers and the hyperparameters–that’s the important thing. You get some labeled data.” And so they started down the path toward the optimal remover.
Mary decides that all she really needs–all she ever needs–is Alexnet, because it works for detecting cat faces in Iphone pictures, and that is a very good thing indeed. She reflects on how hard it is to even imagine what the world was like before you could easily detect cats in images. She loads up the network and prepares for training, testing, and validation.
Meanwhile Leo immediately turns to Google. “I’m sure there is a dataset for training an optimal remover out there somewhere. I mean, heck, it is 2022. Big data was an old idea already ages ago,” he mutters to himself as he works the searches. Finding a couple of datasets on some university sites associated with the word ‘interference,’ he quickly downloads the data and puts it on the team’s fileserver.
Mary and Leo relentlessly train their network, adding and subtracting different kinds of layers, trying various activation functions, hiding layers, unhiding layers and, in short, doing all the usual hyperparameter hand-crafted trial-and-error tasks that make up their state-of-the-art. Eventually they tire and settle on a particular machine, which minimizes the error on the training portion of the dataset, passes validation, and does just fine on the testing portion. Triumphant, they go in seek of Dan and Eunice, prepared to crow about their achievement.
Dan and Eunice are in their lab, but are always able to greet the learners politely and listen to what they have to say. On this day, they say a lot. Or at least they use a lot of words.
“We’ve just built the optimal remover! Only took us half a day, too,” begins Mary.
Leo shows them a printout of the error-vs-epoch curve produced during training. “See you guys? The error starts off high over here,” he says, pointing to the left part of the curve. “And it ends up much lower over here,” gesturing at the right edge of the plot. “So the error is minimized, proving this is the optimal remover. Because, Dan, you can’t have error lower than the minimum.” He finishes with a self-satisfied smirk.
“Great! Really glad for the two of you,” says Eunice. “Just a couple questions because, as you know, Dan and I have to work hard to keep up with you two.”
“What were the signal and the interference in this system design?” asks Dan, leading off the questioning.
Leo quickly replies, having anticipated this all-too-predictable question, “The signal is called ‘desired’ and the interference is ‘undesired.’ The good thing is, this means it is the universal AND the optimal remover!”
“Well, did you try it on all kinds of signals and interferences?” asks Eunice, gently. “I mean, what application areas, or signaling situations did you examine?”
“We don’t need application-area information or datasets. That’s domain expertise, and we’ve built a universal optimal remover.” Mary counters, some annoyance creeping into her voice.
“How did you check the quality of the training and testing datasets?” asks Dan. “Power spectrum analysis, autocorrelation function, attempts at demodulation, cyclostationary analysis, matched-filtering to check for known repeated components, or something else?”
Leo and Mary share one of their knowing looks, which used to be more irritating to Dan and Eunice, but they’ve gotten past it lately. “We didn’t do any of that. We found the dataset on the Machine Learning server of Ivytown University. Why would they put that dataset on the internet if it wasn’t correct and complete?”
“Why indeed,” muttered Eunice under her breath.
“Besides, those signal-processing … whatevers … are the domain of the domain-expertise experts. And that is not us!” proclaims Leo rather too cheerfully.
“Too true, too true,” agreed Dan, eager to move on. “Let me take a quick look at the Readme.txt file that probably came with the dataset from Ivytown. I can see if we have an appropriate signal-processing-based interference mitigator handy to create a comparison.”
“Sure… I guess.” said Leo as he logged into one of Dan’s lab computers. “Here it is.”
An awkward silence fell for a few minutes as Dan read the Readme.txt file, with Eunice looking over his shoulder. “Ah, OK, it says here the signal is 10-MHz LTE and the interferer is a particular pulsed-radar signal with linear FM on the pulses,” said Dan. “We happen to have an interference mitigation algorithm for that problem. I think we told you about it in the past?”
“Yeah, Dan, I figured you’d bring up your statistics-based remover,” said Mary, “so that’s why we decided to compare the optimal remover with your remover. We trained a neural network with fourth-order statistics because you had told us your method used fourth-order moments. So we put in four times the signal and also the signal raised to the fourth power. The resulting neural network was terrible, as expected.”
Dan’s eyebrows had been ratcheting up higher and higher through Mary’s speech, and when she finished they collapsed into a pair of furrows above his hardening eyes. “Mary, our signal-processing method is complicated, and only a part of it relies on fourth-order statistics. In particular, we use the (4, 2) cumulant, not the (4, 0) moment. And throwing some statistics at a neural network is not even close to constructing a version of our method. Not. Even. Close.” This last was ground out between tightly clenched teeth.
Leo decided to defuse the suddenly tense situation, “Dan, Eunice, we know that our comparison between the optimal remover and the traditional hand-crafted labor-of-love totally-old-school signal-processing method is something you think is important, but when we trained a network to use your statistics, the resulting remover has got to be better than your setup. After all, it minimizes the error. So we were trying to be generous to you.”
Leo noticed that Dan and Eunice were sharing a look that reminded him of the look mobsters give each other in the movies just before someone’s eye gets gouged out with a ballpoint pen. He figured he was imagining things but still he rushed to fill the silence with more, and hopefully better, explanations. Before he could, however, Eunice spoke up.
“Alright, let’s get this over with. Did you and Mary find any datasets that are similar to the Ivytown dataset, but that come from an independent research team?” she said.
“No,” said Mary, “we typically don’t do that. See, the way it works in the machine-learning world is that you get a dataset, divide it into training, testing, and validation subsets, and then perform the training and testing. Once that is done, the error is minimized, and you publish the results. We’ve already moved on to another project.”
Dan, glasses off and rubbing his eyes, murmurs, “Oh? And what would that project be?”
“The optimal neural network!”
Is it a Trap?
I think it is. How can we know about the quality of our dataset if we can’t interrogate it? And how can we interrogate it if we don’t know how to study, in detail, various aspects of random processes and their sample paths?
How can we accurately and efficiently compare a machine-learning system with a non-machine-learning signal-processing system if we don’t understand signal processing?
How can we know if one dataset is superior to another in terms of its realism if we don’t know about communication-signal random processes, propagation channels, bad effects of receiver chains, and even details of specific complicated systems of practical interest like LTE?
How can we communicate accurately and efficiently to other researchers in the field (ML or non-ML!) what we have done during the course of an RF ML project if we don’t understand the terminology, notation, and prior work associated with the application area?
What am I missing?
If you leave a comment, try to be civil. Humor is encouraged, sarcasm is welcome, snarkiness is fine, vitriol is forbidden.