Can Machine Learning Let Us Say “Hasta la Vista, Baby” to SSOs Someday?

Can we ever “Terminate” all SSOs at the source?  Will machines and their “learning” ever be so smart that we have just-in-time maintenance preventing backups on sewer mains just BEFORE the SSO would occur?  Can machine learning use detailed genetic profiles to perfectly diagnose cancer treatment, reducing cancer fatalities to zero?

The black and white answer to both questions is the same, “no.”  But the methods and incremental benefits are similar, and real understanding comes from the balance of two other famous Terminator lines: “I’ll be back” vs “Come with me if you want to live.”

Machine learning in various forms has been around since the 1950s.  In the 1970s, skepticism about this approach crept in, possibly because of an over-expectation and misunderstanding of accuracy and results.  Rather than banishing problems/issues to the sideline with machine learning, “I’ll be back” was the more common phrase.

In the 1990s, machine learning shifted from “knowledge-based” to “data-based” research, which coupled with the emergence of the internet ecosystem has led to an “explosion” in machine learning and artificial intelligence research, application, and use.

IBM’s “Watson” is named after its founder Thomas Watson, and his son Thomas Watson, Junior.  Junior became president of IBM in 1952 and led IBM from being a “punch card” company into the computer age.  In 2010 IBM introduced its Watson suite of artificial intelligence/machine learning tools and processes.  One of its first applications of Watson was for cancer diagnosis and suggested treatment.  Watson was loaded with over 20 million oncology records along with all published oncology research.  Doctors can load in a patient’s genetic profile and within minutes Watson can diagnose the MOST LIKELY diagnosis and suggested treatment.

Knowledge-based software systems use rules developed from institutional knowledge, known conditions, and legacy understandings (domain experience).  A deterministic, rules-based system would be a series of “if a and b and c but not d, then X” type of approaches.  This approach is limited to current understanding, and by the sheer size and complexity of such a rules-based approach – if the rules could be added to and fine-tuned across millions of iterations, it could ultimately get there, but time and resources suggest a different approach.  Watson’s data-based machine learning approach builds a model from the existing data: 20 million oncology records, all published research, all the details of each patient’s genetic profile.  Each patient profile is considered in complete detail and hidden patterns are unearthed.

With each patient’s diagnosis and suggested treatment from Watson, there’s no black/white reverse engineering of the rules (because of a, b, c, d, thus X), it is possible to understand the strongest correlating variables (gender, age, various genetic markers, etc. are ‘important’) for each patient, but there’s no guarantee that another patient with the exact same genetic profile and living conditions (an identical twin that had always lived in the same house with the other person for example) would either develop that condition, or be diagnosed in the same manner.  It’s a MOST LIKELY condition.

Does this eliminate doctors?  Do patients go right from Watson straight to chemo?  NO!  If Watson diagnoses an internal medicine problem, do we go from Watson, immediately to surgery?  Again NO!  What it does enable, though, is for the skilled doctors to target the rest of the diagnosis and validate the result.  A skilled doctor with this kind of targeting analysis becomes dramatically more efficient at diagnosis and saving lives.  “Come with me if you want to live.

Wow, awesome, but cancer and sewer pipes?  There’s big money in cancer research, but pipes are pipes, as they say.

The internet ecosystem that has developed in the past 20+ years has led to a proliferation of data science and machine learning tools and processes that are either free/open-source, or very inexpensive.  How can this be applied to sewer pipes and their maintenance?

Well, “I’ll be back” still applies to the water and wastewater agencies.  If you had cancer 5-10 years ago, and similar symptoms appear again, the MOST LIKELY scenario is that it’s the same diagnosis and same treatment (unless new treatments have emerged in that interim).  If a water main has broken before, those are the MOST LIKELY water mains to break again.  If a sewer main has had an SSO, those are the MOST LIKELY sewer mains to require hot spot (frequent) cleaning.

What about the rest?  For the remainder of a collection system’s sewer mains (the vast majority of mains that have not experienced an SSO), you can put them all on a “rule of thumb” frequency schedule and clean them as often as you possibly can.  This becomes akin to going to the doctor once a week for a full diagnosis and body scan – a bit expensive.  And crew attrition, budget, or water constraints are limiting the amount of gravity main cleaning collection systems can do.

Beyond SSO locations, the next traditional step is the development of a bottoms-up rules-based approach for the remaining sewer mains (the vast majority of mains).  These rules can be based on available CCTV data or cleaning observations (light/medium/heavy).  “If light flow while cleaning 2x in a row, push out to a less frequent schedule.”  By their nature, rules are based on historical incidence (reactionary) and if one could iterate these models thousands of times, ultimately, it can describe a sewer main’s behavior provided you have the time, and nothing changes (expansion of the system, pandemic changes in behavior, etc.), in which case you’d have to start those iterations all over again.

A machine learning approach can be done very cost-effectively using off-the-shelf open-source tools with your data.  And a machine learning model is constantly updated as new data comes in and the behavior of sewer mains changes over time and conditions.

The data must be “wrangled” and models can be built.

Does this get us to “Hasta la vista, baby” for SSOs?  Again, no!  For the cancer patient, can we guarantee that the diagnosis is correct and that the doctor can save the patient?  Again, no!  But the well-trained doctor knows where to focus and knows the MOST LIKELY place to start diagnosis and treatment.  For sewer main maintenance with the goal of preventing or minimizing SSOs, machine learning can significantly improve on the traditional bottoms-up rules-based approaches to gravity main cleaning frequencies. 

A final piece, of course, is how to gain the confidence in the machine learning results and apply this in a practical sense in the field.  If machine learning rank orders my 5000 gravity main segments, do you expect me to just believe you and drive my truck all over town in that order?  Again, no!  Between Watson and treatment or surgery, there are many steps and iterations, and we’ll discuss those further in a subsequent blog.