如何快速看完/理解一篇英文论文？

2020-02-02 本文已影响0人视星等壹

作为一个工科生，在全国人民抗击病毒宅在家里的这段时期迎来自己最长的一个寒假。但是，学习科研不能因为寒假过长而落下，现在结合目前在看的一篇论文，给广大在读（ke)学(yan)生(gou)分享一下快速看完一篇英文论文的方法～

个人情况简介

-上海交大研一在读，一作EI论文1篇已录用
-研究方向为复杂装备的状态检测、可靠性

英文论文的结构

在阅读一篇论文时，首先要搞清楚英文论文的结构，主要分为以下部分：
-Abstract
论文摘要。摘要部分出现在论文的最前面，因此也最为重要，需要重点阅读。

-Introduction
介绍。介绍论文研究的背景、研究对象等信息，对不熟悉背景的同学有引导的作用。有时还包括研究方法论的系统介绍。

-Methodology
方法论。主要介绍研究采用的算法（研究方法）的原理、创新点。

-Experiment & Result
实验和结果展示。这部分结合研究方法进行实验（计算机实验、力学实验等等），对实验数据、结果进行展示分析，通常以图表的形式呈现。

-Conclusion
结论。通过对实验结果的分析，对全文的结论进行总结和强调。

-Reference
参考文献。一般来说参考文献可以不看。但在判断论文质量的时候可以起到作用，参考文献越多，文献越权威，可以判断该论文的质量越好（当然期刊级别也可以判断，所以尽量不要看水刊哦）

阅读顺序

对于阅读一篇英文论文的顺序，不同的同学会有不同的喜好。从个人的角度来说，我的顺序通常时Abstract->（Introduction）->Conclusion->Experiment & Result->Methodology
采用这样的顺序原因如下：
-摘要部分展示了最主要的背景和结论，应该重点阅读，当发现摘要展示的内容和我们的期望/研究对象不相符，可以直接跳过该论文。

-对研究背景了解可以直接跳过Introduction部分，该部分主要是结合研究背景来强调研究的重要性（每个研究者当然都会说自己的研究重要）。当然不熟悉研究背景还是可以看一看～

-结论部分重申了重要的结论和重要的实验结果，对于这些结论，我们会感到好奇，究竟是怎么得到的？如果论文得出的都是我们不感兴趣的（与我们的研究对象不相关）的结论，可以直接跳过该论文。

-在结论部分我们发现了令我们感兴趣的结论，于是我们可以回到实验和结果展示部分，仔细研究实验和结果展示的图、表。作者为什么要绘制这些图表？作者通过怎样的实验得到了这些图和表？

-最后阅读方法论部分。对于我而言，这样可以省去被数学公式、物理原理难住的时间。如果确定论文的方法可行，适合自己的研究，再弄懂这一部分。此时最好再借助一些博客和工具书，详细理解公式的推导过程。

总的来说就是：
1.通读摘要、（背景介绍）、结论，确认是否为自己感兴趣（相关）的方向、方法。
2.阅读实验和结果展示部分，了解实验是如何进行的，数据进行了怎样的处理和分析。
3.确定论文研究方法可借鉴，再阅读方法论，彻底理解方法论的公式推导。如果并不想采用论文的研究方法，这一部分的阅读也可省去。

实例

现在以目前正读的一篇会议论文Wind Turbine Structural Health Monitoring: A Short
Investigation Based on SCADA Data来简单介绍一下这个过程。

首先是摘要部分

*The use of offshore wind farms has been growing in recent years, as steadier and higher
wind speeds can be generally found over water compared to land. Moreover, as human activities tend to complicate the construction of land wind farms, offshore locations, which can be found more easily near densely populated areas, can be seen as an attractive choice. However, the cost of an offshore wind farm is relatively high, and therefore their reliability is crucial if they ever need to be fully integrated into the energy arena. As wind turbines have become more complex, efficient, and expensive structures, they require more sophisticated monitoring systems, especially in offshore sites where the financial losses due to failure could be substantial. *This paper presents the preliminary analysis of supervisor control and data acquisition (SCADA) extracts from the Lillgrund wind farm for the purposes of structural health monitoring. A machine learning approach is applied in order to produce individual power curves, and then predict measurements of the power produced of each wind turbine from the measurements of the other wind turbines in the farm. A comparison between neural network and Gaussian process regression is also made.

可以看出，斜体部分是介绍整篇论文的背景。（另外，如果有同学摘要阅读困难的话建议谷歌翻译哦）介绍了海上风场的日益增长，然而海上风场的运维费用昂贵，因此，海上风机的状态监测具有重要性。粗体部分介绍的是论文采用的方法：首先用机器学习方法绘制出每个风机的功率曲线，再用其他风机的功率值去预测某个风机的功率值，并且对比了神经网络和高斯过程两种方法。

接着看结论部分

由于摘要中的背景比较详细了，我们可以直接跳过Introduction部分直接看Conclusion。

This paper presented a preliminary exploration of the suitability of SCADA extracts from the Lillgrund wind farm for the purposes of SHM. Artificial neural networks and Gaussian processes were used to build a reference power curve (wind speed versus power produced) for each of the 48 turbines existing in the farm. Then, each reference model was used to predict the power produced in the rest of the turbines available, creating thus a confusion matrix of the MSE errors for all combinations.

The results showed that nearly all models were very robust with the highest MSE error to be 4.8291, and this was happening when the model trained in turbine 4 was predicting power from turbine 3. Both turbines 3 and 4 are located in the outside row of the wind farm. It was shown that when wind speed data which did not come from time instances where the error status was ‘0’ (meaning healthy data), were used as an input to the trained neural networks, the MSE error was significantly larger. Although, it was seen that in some cases the very large MSE was due to emergency stops or manual stops, and it is currently not known whether there was scheduled maintenance, this result, still shows the potential for novelty detection in the turbines.

In this spirit, the confusion matrices that were presented earlier can form the baseline for thresholds for a population-based SHM of the whole farm. It is anticipated that the power curve, and possibly other similar features, will be adequate to be used in future work in the construction of control charts for the monitoring of the whole wind farm and of the potential interaction or influence of the turbines with one another during their normal operation.Future work will also focus on the full analysis of the error statuses that were presented during the recorded time. In the comparison of the regression between neural networks and Gaussian processes, it was shown that there were no significant differences, with the networks performing with slightly lower MSE error.

结论部分是对摘要部分未说明完全的部分更详细的补充。斜体进一步扩充了摘要的内容：采用了神经网络和高斯过程对48个风机进行了功率曲线建模，接着采用每个风机的模型对其他风机的功率进行预测，得到了均方误差（MSE errors）的混淆矩阵（confusion matrix）。粗体部分详细分析了结果：所有的模型很稳健，最高的均方误差为4.8291，接着分析了最高均方根误差产生的原因（采用风机4的模型预测风机3）。强调了产生的混淆矩阵可以作为风机异常状态监测的基线值，可以结合控制图来监测整个风场的运行状态。最后的斜体字强调了一下未来可能的研究工作。

通过结论，我们大概知道了论文的研究方法。接着可以去看论文的实验和结果展示部分。

再看实验和结果展示部分

从结论我们看出，最重要的结果是神经网络和高斯过程均方误差的混淆矩阵，所以这里只找这两个图就好；果不其然，在文章的正中间找到了。

Confusion matrix with MSE errors created from the neural networks - testing set（神经网络混淆矩阵）.png

Confusion matrix with MSE errors created from the Gaussian processes - testing set（高斯过程混淆矩阵）.png

我们再来看看作者给出对应的分析(神经网络）：

From the results it is clear that almost all the trained networks are very robust and the maximum MSE error is around 5, which mainly occurs in turbines 3 and 4 which are located in the outside row of the wind farm.

和结论部分类似，说明了模型稳健，最大的均方误差在5左右，发生在位于风场最外排的风机3和风机4之间。

Subsequent scanning of the data revealed that the majority of the instances where the regression error becomes high (in turbine 4) happened when the turbine was not working, either from emergency stops or manual stops...Essentially, Figure 2 shows a map of potential thresholds, which can be used for the monitoring (in a novelty detection scheme) of the turbines individually or as a population.

这里说明了当风机未正常工作时，MSE的值非常高。混淆矩阵图展示了一个风机正常工作的潜在阈值，可以用来进行单个风机或整个风场的状态监测。

这部分可以说就是论文结论部分的拓展。

最后——

当看完这些之后，我们再思考，这个方法/实验可以用到我们的问题中去吗？如何实现？这时就可以进一步结合其他的参考资料了解一下神经网络回归、高斯过程回归、MSE的计算和混淆矩阵的生成这些问题啦！

到此，一篇论文读完。

参考文献
Papatheou E, Dervilis N, Maguire E, et al. Wind turbine structural health monitoring: a short investigation based on SCADA data[C]. 2014.
原文链接

这篇文章如果对你有帮助的话请点个赞哦～

比心