So how in the world would a Machine Learning Scientist predict an Earthquake? You might probably think, just collect all the data you can find, stuff it into Hadoop, and run some supervised machine learning algorithms. Eh…not so much!
What we will do is apply some models from Astronomy and Theoretical Physics to model the process, and see if the techniques developed for detecting weak patterns in Astronomy can be applied to the problem of detecting Earthquakes and other crashes in nature.
Moreover, we will, eventually, see how to convert a highly non-convex optimization problem into a convex (LP) problem. It is going to take me some time to get there…first some motivation
Scale Invariance in Nature:
A famous math problem is to compute the length of the Coastline of Britain. The problem is that the shorter your ruler, the longer the coastline seems. If we use a 200km ruler, we measure 2400 km. If we use a 50km one, we get 3400 km. And so on
When this happens, we call this, mathematically, Scale Invariance.
A classic mathematical model for a scale invariant process is Brownian motion, also known as a Wiener Process .
A Wiener Process is Scale Invariant
That is, we model the process with an underlying drift and a volatility . The governing stochastic differential equation is
This describes a power law growth (at rate ), decorated with random fluctuations. Many physical systems exhibit this kind of stochastic scale invariance and growth. Determining if we have power law growth or not is difficult as it is–detecting patterns in this randomness is even harder. But we see the results — the crashes– in nature all the time
How Nature Works: Chaos, Crashes and Critical Phenomena
Nature is not , in fact, in Equilibrium. Chaos, crashes, and critical events occur everywhere. Earthquakes, Avalanches, and other natural disasters threaten us every day. In Theoretical Chemistry & Physics, we call these events Critical Phenomena. It has been proposed [1,2] that these “catastrophic events are ‘‘outliers’’ with statistically different properties than the rest of the population and result from [internal, self-amplifying, cascading] mechanisms” We therefore need a different kind of statistical theory that can deal with inherently non-equilibrium processes near a critical point, such as crashes, phase transitions, and other catastrophic events.
This theory is the Renormalization Group (RG) Theory. [Ken Wilson won the Nobel Prize for this, and was a professor in physics at my undergraduate school, Ohio State] RG theory says that near a critical point , like a phase transition or a crash, the dynamics changes dramatically. Random fluctuations start appearing on all time and length scales–hence the term scale invariance. The dynamics will lose its essential stochastic character . The system still follows a power law
but the seemingly random fluctuations are, in fact, governed solutions of the 1-D Renormalization Group equations. A simple model for this, in discrete, physical systems, is called Discrete Scale Invariance, and is governed by
where is the time the critical event occurs, and is the natural frequency of the oscillation. (This equation describes the first order solution of the RG flow map near the critical point for a phase transition on a discrete lattice–although we apply it far from the critical point to all sorts of natural phenomena)
If indeed nature displays these Discrete Scale Invariant (DSI) patterns prior to an event like an earthquake, we would hope we could detect them — and do so with enough confidence that we don’t end up in jail.
To predict an Earthquake, we measure the concentration of unusual chemicals in the local groundwater as a function of time , leading to a graph of the form
The problem of the scientist, machine learning or otherwise, is to distinguish between random and log-periodic behavior and to predict the critical time .
The graph on the left shows a simple fit of the DSI log-periodic function, overlayed on the the data. Is this a good fit? Do we believe this? The problem is to fit this non-linear curve to the data with some confidence. We need to determine the power law exponent , the frequency of oscillation , and, most importantly, the critical time when the Earthquake will occur. (The other parameters can be slaved to these).
A classic time series / Astronomy approach is to detrend the series (fit first) and then find the best by examining the Periodogram using LSSA–as explained in our last post.
Because these methods are highly non-convex, it is very difficult to get a good fit !
It turns out that some very sophisticated (and convex) machine learning methods have been developed recently by/for Astronmers to solve a very similar problem–detecting Gravity Waves.
General Relativity predicts that when two co-rotating neutron stars collide, they form a Black Hole
and cause a massive space-time vibration, called a Gravity Wave, which looks like
… simple Gravity Waves take the form 
Here, the collision time is the critical time . Gravity waves are very weak signals and we need a very clever approach to detect what is really a wave and what is just noise.
Generally speaking, we can classify DSI-type functions as Chirps– a function that oscillates strongly along a slow moving envelope. A Chirp takes the form
where the amplitude and the phase are smoothly varying functions of time, and the degree of oscillation is large.
Nature Shows the Way
Like crashes, Chirps occur everywhere in Nature. For example, Bats use Chirps as part of their echo-location sonar.
Many modern machine learning methods use clues form nature to build a better detector. The so-called Deep Learning methods, pioneered by Andrew Ng and Google, for detecting cat faces, numbers on houses, etc, are based on our understanding of how the human retina and visual cortex recognizes images.
There are also machine learning methods designed to mimic Bat sonar–the one we are interested in looking at here is called Chirplet Basis Pursuit ,. It has been specifically designed to detect Gravity Waves–we will try to use it to detect Discrete Scale Invariance (without going to jail). And we will do this using a convex optimization !
Stay tuned…same Bat Time … same Bat Channel
 199 Per Bak, How Nature Works: the Science of Self-Organized Criticality
 1996 Y. Huang, G. Ouillon, H. Saleur, D. Sornette, Spontaneous generation of discrete scale invariance in growth models
 1998 D. Sornette, Discrete scale invariance and complex dimensions
 2007 A.R.T. Jonkers, Discrete scale invariance connects geodynamo timescales
 1996 Anders Johansen , Didier Sornette , Hiroshi Wakita , Urumu Tsunogai , William I. Newman , Hubert Saleur, Discrete Scaling in Earthquake Precursury Phenomena: Evidence in the Kobe Earthqauke, Japan
 2006 Emmanuel J. Cand`es, Philip R. Charlton,& Hannes Helgason, Detecting Highly Oscillatory Signals by Chirplet Path Pursuit