【论文略读28】继PDE-Net 2.0后引用它的一系列文章

许久未记录正式的博客笔记了，在此补上一系列文章

它们是我读完PDE-Net 2.0后顺着这条线索收集的

搜集文献（不分顺序）

Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning
- 董彬老师组
- https://arxiv.org/abs/1905.11079?context=math
Data-driven recovery of hidden physics in reduced order modeling of fluid flows
- https://aip.scitation.org/doi/abs/10.1063/5.0002051
- 期刊名Physics of Fluids，20年初
DeepMoD: Deep learning for model discovery in noisy data
- https://www.sciencedirect.com/science/article/pii/S0021999120307592
- 期刊名Journal of Computational Physics，20年11月
Stability selection enables robust learning of partial differential equations from limited noisy data
- https://arxiv.org/abs/1907.07810
- ArXiv上的分类是Mathematics—>Numerical Analysis，19年7月
Derivatives Pricing via Machine Learning
- https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3352688
- https://www.scirp.org/journal/paperinformation.aspx?paperid=94637
- 期刊名Business & Economics，19年
Extracting Interpretable Physical Parameters from Spatiotemporal Systems Using Unsupervised Learning
- https://journals.aps.org/prx/abstract/10.1103/PhysRevX.10.031056
- 期刊名PHYSICAL REVIEW X，20年9月
DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm
- https://www.sciencedirect.com/science/article/pii/S0021999120303582
- 期刊名Journal of Computational Physics，20年10月
Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data
- https://aip.scitation.org/doi/abs/10.1063/1.5136351
- 期刊名Physics of Fluids，20年1月
Data-driven Discovery of Partial Differential Equations for Multiple-Physics Electromagnetic Problem
- https://arxiv.org/abs/1910.13531
- ArXiv上分类Physics—>Computational Physics，19年10月
TIME: A Transparent, Interpretable, Model-Adaptive and Explainable Neural Network for Dynamic Physical Processes
- https://arxiv.org/abs/2003.02426
- 20年5月
Sparse Symplectically Integrated Neural Networks
- https://proceedings.neurips.cc/paper/2020/hash/439fca360bc99c315c5882c4432ae7a4-Abstract.html
- NIPS’20
DeepM&Mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks
- https://arxiv.org/abs/2009.12935
- ArXiv上分类Physics—>Computational Physics，20年9月
Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery
- https://ieeexplore.ieee.org/abstract/document/9180100
- IEEE Transactions on Neural Networks and Learning Systems，20年8月

1. Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning

文献资料

董彬老师组的文章，2020/10挂出来
https://arxiv.org/abs/1905.11079?context=math

表格总结

总结完了觉得文章的思路正常，文章的一个亮点是抽象数值方法为RL问题时引入了meta-learner的概念？

文献条目	具体内容
Target	先声明大领域，依然是拟合PDE数据，包括数值方法、DL方法本文针对求解特定的PDE，守恒律方程（associated with conservation laws）
Motivation/Idea	守恒律很重要，应用多。且Burgers方程就是其一个特例从数值方法出发，把PDE-solver看成MDP（Markov Decision Process），进而抽象为强化学习基本问题上述数值方法是WENO（Weighted Essentially Non-Oscillatory Schemes），稍后在细节介绍，基于它有两个ideas：自动化得到方法中的weights 自动（原WENO需要进行数值判断）判断方法中的upwind direction，此概念在细节中介绍
Method	整个模型没有命名，其实提供了一种把数值方法转化为网络、DL问题的思路基本模型就是求解PDE的WENO数值方法抽象为MDP形式 MDP问题再引入RL，具体细节将在下面介绍
Pros and Cons	Pros: 把求解1维情况下守恒律方程的数值方法转化为RL问题，其建模过程很标准，可以说是一个framework 数值方法直接转化为MDP，RL，model-driven么文章重点提到action的构建是个meta-learner,一个说法是，这样的模型泛化不错，一个功劳就归meta-learner Cons: 别问我为什么meta-learner，怎么就meta-learner了，原因我放在后面单独提一下只考虑了1维情形下守恒律，不能直接推广的话。。。流（flux）的建模依托于WENO，插值太多，虽然插值很精细但总觉得不优美，说不清楚

部分细节

模型基本设置

先讲讲所谓有守恒律的方程指什么吧，其实就是指这个方程有守恒律，形式如下：

$\displaystyle u_x(x,t)+(f(u(x,t)))_x = 0,\ where\ a\leq x\leq b,\ t\in [0,T],\ u(x,0)=u_0(x) \tag{1}$

这个形式就叫守恒律。由参数 $\{x,t\}$ 取值在区间上可知，ob数据的形式是把时空间分别分割，得到网格式数据，分割一般均匀，设分割长度和总数分别为 $\Delta x, \Delta t$ 和 $J, N$，详见原文公式 $(2.2)$。

然后有几个概念要说一下，真实解 $u(x,t)$ 在网格上的取值为 $u(x_j,t_n)$，它的近似记为 $\mathcal{U}_j^n$；另外，$(1)$ 式中间的 $f$ 称为flux，可以理解为守恒律中的流，真实的flux记为 $f_j^n=f(u(x_j,t_n))$。

最后有个空间中的插值，记号为 $\displaystyle x_{j\pm\frac{1}{2}}=x_j\pm\frac{\Delta x}{2}$，感觉就是更细一点的插值。

WENO数值方法

从名字上看这个方法，Weighted Essentially Non-Oscillatory Schemes，推测就是插值更细因此可能拟合结果波动更小，插值的时候还有重要性（插值准确性）加权。

下面给个WENO方法的计算过程：

如上最后两行，WENO就像一个多重平均插值近似，主要问题在于不同插值权重的计算、和最后一个upwind direction计算。后者应该是希望数值解不要震荡的方法，参考二阶迎风格式。

WENO对应到MDP

没啥好说的，根据WENO的计算方法写成算法，然后对应到MDP中的state $S$，action $A$，transition dynamics $P$，reward $r$。

这样就完事了，再简要介绍一下里面的东西是什么：

$s$ 是state，有个state function，把指标集对应的$\mathcal{U}_j^\lambda,\ \lambda\in \Lambda$ 映射到 $\hat{f}_j^n$。一个例子是对pdf的三次插值，那 $s$ 就映到3个向量，每个向量是每次插值的所有点（$f_j^\lambda$）和该次插值对应的权重，具体实现是方式是用6层，每层64神经元的MLP，并use the Twin Delayed Deep Deterministic (TD3) policy gradient algorithm to train the RL policy。。。
$A$ 是action，就是pdf最后一行提到的选择哪些插值，由 $s$ 函数生成
$P$ 相当于迭代机制，比如前向欧拉对应的迭代形式。。。
$r$ 用的插值时的无穷范数的相反数

哪来的meta-learner

思想不错，$A$ 成为一个meta-learner，原因只有这个靠谱：这个RL里的 $A$ 是通过 $s$ 函数输出得到的，是从当前状态判断的，不是像原来的数值方法那样，在没有其它网络（如上文MLP）的帮助下直接从数值机制推断的。

文章提的其它原因不太靠谱：

Learning the policy $P$ within the RL framework makes the algorithm meta-learning like [1, 5, 10, 20, 29].
The learned policy network is carefully designed to determine a good local discrete approximation based on the current state of the solution, which essentially makes the proposed method a meta-learning approach.
We attribute the good generalization ability of RL-WENO to our careful action design, which essentially makes RL-WENO a meta-learner under the WENO framework and thus have strong out-of-distribution generalization.

2. DeepMoD: Deep learning for model discovery in noisy data

突然想到一个问题，目前没看到几篇文章很关注PDE的边界条件！本文也没有考虑。

文献资料

https://www.sciencedirect.com/science/article/pii/S0021999120307592
好像是巴黎大学的研究者写的
期刊名Journal of Computational Physics，20年11月

小结：我觉得不彳亍

模型：DeepMoD及具体内容

本文提出模型，DeepMoD，指的是deep learning based model discovery algorithm，目标是从数据中学习背后的PDE。该PDE模型的形式其实比较局限，固定为：

$\displaystyle \partial_t u = \partial_t u(x,t)=\mathcal{F}(u,u_x,uu_x,u_xx\cdots)\approx \Theta\xi \tag{1}$

两个大困惑，读了好几遍搞不明白，文章为什么要刻意避开这个问题❔我觉得只考虑了这样的形式是因为把神经网络 $f_i$ 作为函数字典，输入是 $(\mathbf{x, t})$，那么输出对输出自动求导看成偏导数。但是为什么原文公式 $(1)$ 没有 $u$ 对 $t$ 求导呢，是不是在刻意混淆？而且文章的实验表示不是整个grid上都有数据，可以随机取，那么偏导数也是不能全的啊，怎么保证神经网络就能对输入求导得到字典中的基函数，见下面图中的Library？

具体方法采用了函数字典，使用稀疏回归，回归时加正则。使用densely-connected feed-forward neural network作为函数的估计，来构建函数字典。考虑了三种loss，MSE损失针对 $u(x,t)$，只考虑每条轨线末端值的监督；回归损失针对 $\Theta\xi$ 的拟合，但是原文的式子下标没写清楚，弄不明白哪里做了监督；最后是字典系数 $\xi$ 的 $L_1$ 正则

网络训练有2个骚操作：

数据有一些处理，当神经网络训练完之后，得到的函数字典的稀疏系数其实会不那么稀疏（$L_1$ 不能保证完全稀疏），进一步进行无量纲化，方式是所有变量标准化，包括原文 $(2)$ 式中的 $\partial_t u,\Theta,\xi$，具体意义见原文 $(3,4)$ 式。
网络训练完之后，再练一次，不加 $L_1$ 正则了，回归项只用之前筛选出来的，原文只说这样得系数的无偏估计❔

这个方法起效果需要函数字典充足，不过实验结果似乎表示很充足也不至于过拟合，有对函数系数的正则，这个正则和系数某个阈值的设置有关。这个设置似乎不是general的（见Discussion部分）。字典中函数的系数代表了某种模型选择。

Pros and Cons

前两个文章自己说的优点很奇怪：

一个是对噪声非常稳健，是实验结论
第二个是不需要训练集。这应该指的是不需要太多训练数据，小样本也可以，并不是完全不需要，原文进行了5种方程的人工实验，一个结论是几种方程的模拟只需要 $\mathcal{O}(10^2)$ 的数据量。

优点原文：This construction makes it extremely robust to noise, applicable to small data sets, and, contrary to other deep learning methods, does not require a training set.

实验结果之一：

find that it requires as few as $\mathcal{O}(10^2)$ samples and works at noise levels up to 75%
第三个是DeepMoD对数据的维度没有要求，之前有些模型是针对1维数据的
第四个，函数字典有模型选择的功能，只是少了点味

文章表示DeepMoD很稳健，需要数据量少是因为利用regression-based approach完成model discovery任务，用神经网络infer system parameters。。。说🔨呢，真的是原文。。。我很不喜欢这样的说法

缺点来了，读不明白就甩锅：

PDE模型的形式到底局限么？原文是不是在故意回避这个问题
怎么保证神经网络就能对输入求导得到字典中的基函数？偏导数不一定能全？
两次训练为什么是无偏估计，没说
阈值的设置好像是special的
只靠loss得到所谓的稳健、小样本。难道又是实验验证？我不信，而且全是模拟数据
本文提了一下边界条件，但是仍然没考虑

10. TIME: A Transparent, Interpretable, Model-Adaptive and Explainable Neural Network for Dynamic Physical Processes

https://arxiv.org/abs/2003.02426
20年5月

参考文献

APA 7th格式

[1] Wang, Y., Shen, Z., Long, Z., & Dong, B. (2019). Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning. arXiv e-prints, arXiv:1905.11079. https://ui.adsabs.harvard.edu/abs/2019arXiv190511079W

[2] Pawar, S., Ahmed, S. E., San, O., & Rasheed, A. (2020). Data-driven recovery of hidden physics in reduced order modeling of fluid flows. Physics of Fluids, 32(3), 036602. https://doi.org/10.1063/5.0002051

[3] Both, G.-J., Choudhury, S., Sens, P., & Kusters, R. (2020). DeepMoD: Deep learning for model discovery in noisy data. Journal of Computational Physics, 109985. https://doi.org/https://doi.org/10.1016/j.jcp.2020.109985

[4] Maddu, S., Cheeseman, B. L., Sbalzarini, I. F., & Müller, C. L. (2019). Stability selection enables robust learning of partial differential equations from limited noisy data. arXiv e-prints, arXiv:1907.07810. https://ui.adsabs.harvard.edu/abs/2019arXiv190707810M

[5] Ye, T., & Zhang, L. (2019). Derivatives Pricing via Machine Learning. Journal of Mathematical Finance, 09, 561-589. https://doi.org/10.4236/jmf.2019.93029

[6] Lu, P. Y., Kim, S., & Soljačić, M. (2020). Extracting Interpretable Physical Parameters from Spatiotemporal Systems Using Unsupervised Learning. Physical Review X, 10(3), 031056. https://doi.org/10.1103/PhysRevX.10.031056

[7] Xu, H., Chang, H., & Zhang, D. (2020). DLGA-PDE: Discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm. Journal of Computational Physics, 418, 109584. https://doi.org/https://doi.org/10.1016/j.jcp.2020.109584

[8] Vaddireddy, H., Rasheed, A., Staples, A. E., & San, O. (2020). Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data. Physics of Fluids, 32(1), 015113. https://doi.org/10.1063/1.5136351

[9] Xiong, B., Fu, H., Xu, F., & Jin, Y. (2019). Data-driven Discovery of Partial Differential Equations for Multiple-Physics Electromagnetic Problem. arXiv e-prints, arXiv:1910.13531. https://ui.adsabs.harvard.edu/abs/2019arXiv191013531X

[10] Singh, G., Gupta, S., Lease, M., & Dawson, C. N. (2020). TIME: A Transparent, Interpretable, Model-Adaptive and Explainable Neural Network for Dynamic Physical Processes. arXiv e-prints, arXiv:2003.02426. https://ui.adsabs.harvard.edu/abs/2020arXiv200302426S

[11] DiPietro, D. M., Xiong, S., & Zhu, B. (2020). Sparse Symplectically Integrated Neural Networks. arXiv e-prints, arXiv:2006.12972. https://ui.adsabs.harvard.edu/abs/2020arXiv200612972D

[12] Cai, S., Wang, Z., Lu, L., Zaki, T. A., & Karniadakis, G. E. (2020). DeepM&Mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks. arXiv e-prints, arXiv:2009.12935. https://ui.adsabs.harvard.edu/abs/2020arXiv200912935C

[13] Kim, S., Lu, P. Y., Mukherjee, S., Gilbert, M., Jing, L., Čeperić, V., & Soljačić, M. (2020). Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery. IEEE Transactions on Neural Networks and Learning Systems, 1-12. https://doi.org/10.1109/TNNLS.2020.3017010