【光能蜗牛的人工智能之旅】week1 : 4.What is M
What is machine learning? In this video, we will try to define what it is and also try to give you a sense of when you want to use machine learning. Even among machine learning practitioners, there isn't a well accepted definition of what is and what isn't machine learning.
机器学习是什么?在本视频中,我们会尝试着进行定义,同时 让你懂得何时会使用机器学习。实际上,即使是在 机器学习的专业人士中,也不存在一个被广泛认可的定义 来准确定义机器学习是什么或不是什么,
But let me show you a couple of examples of the ways that people have tried to define it. Here's a definition of what is machine learning as due to Arthur Samuel. He defined machine learning as the field of study that gives computers the ability to learn without being explicitly learned.
现在我将告诉你 一些人们尝试定义的示例。第一个机器学习的定义来自于Arthur Samuel。 他定义机器学习为,在没有进行特定编程的情况下, 给予计算机学习能力的领域。
Samuel's claim to fame was that back in the 1950, he wrote a checkers playing program and the amazing thing about this checkers playing program was that Arthur Samuel himself wasn't a very good checkers player. But what he did was he had to programmed maybe tens of thousands of games against himself, and by watching what sorts of board positions tended to lead to wins and what sort of board positions tended to lead to losses, the checkers playing program learned over time what are good board positions and what are bad board positions. And eventually learn to play checkers better than the Arthur Samuel himself was able to.
Samuel成名可以回溯到50年代,他编写了一个西洋棋程序。 这程序神奇之处在于,编程者自己并不是个下棋高手。 但因为他太菜了,于是就通过编程, 让西洋棋程序自己跟自己下了上万盘棋。通过观察 哪种布局(棋盘位置)会赢,哪种布局会输, 久而久之,这西洋棋程序明白了什么是好的布局, 什么样是坏的布局。然后就牛逼大发了,程序通过学习后, 玩西洋棋的水平超过了Samuel。
This was a remarkable result. Arthur Samuel himself turns out not to be a very good checkers player. But because a computer has the patience to play tens of thousands of games against itself, no human has the patience to play that many games.
这绝对是令人注目的成果。 尽管编写者自己是个菜鸟,但因为 计算机有着足够的耐心,去下上万盘的棋, 没有人有这耐心去下这么多盘棋。
By doing this, a computer was able to get so much checkers playing experience that it eventually became a better checkers player than Arthur himself.
This is a somewhat informal definition and an older one. Here's a slightly more recent definition by Tom Mitchell who's a friend of Carnegie Melon.
通过这些练习, 计算机获得无比丰富的经验,于是渐渐成为了 比Samuel更厉害的西洋棋手。上述是个有点不正式的定义, 也比较古老。另一个年代近一点的定义,由Tom Mitchell提出,来自卡内基梅隆大学
So Tom defines machine learning by saying that a well-posed learning problem is defined as follows. He says, a computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. I actually think he came out with this definition just to make it rhyme.
Tom定义的机器学习是 这么啰嗦的,一个好的学习问题定义如下,他说,,假如一个程序处理 T 上的性能在通过学习经验E后有所提升,且性能由P评判。那么就认为这个程序能从经验E中学习,解决任务 T,达到 性能度量值P。
我认为他提出的这个定义就是为了押韵
For the checkers playing examples, the experience E would be the experience of having the program play tens of thousands of games itself. The task T would be the task of playing checkers, and the performance measure P will be the probability that wins the next game of checkers against some new opponent.
Throughout these videos, besides me trying to teach you stuff, I'll occasionally ask you a question to make sure you understand the content.
在西洋棋那例子中,经验e 就是 程序上万次的自我练习的经验 而任务 t 就是下棋。性能度量值 p呢, 就是它在与一些新的对手比赛时,赢得比赛的概率。 在这些视频中,除了我教你的内容以外, 我偶尔会问你一个问题,确保你对内容有所理解
Here's one.
On top is a definition of machine learning by Tom Mitchell. Let's say your email program watches which emails you do or do not mark as spam.
没错,这里我就问你一个问题,顶部是Tom Mitchell的机器学习的定义, 我们假设您的电子邮件程序会观察收到的邮件是否被你标记 为垃圾邮件。
So in an email client like this, you might click the Spam button to report some email as spam but not other emails. And based on which emails you mark as spam, say your email program learns better how to filter spam email. What is the task T in this setting?
在这种Email客户端中,你点击“垃圾邮件”按钮 报告某些email为垃圾邮件,不会影响别的邮件。基于被标记为垃圾的邮件, 您的电子邮件程序能更好地学习如何过滤垃圾邮件。请问, 在这个设定中,任务 T 是什么?
In a few seconds, the video will pause and when it does so, you can use your mouse to select one of these four radio buttons to let me know which of these four you think is the right answer to this question.
So hopefully you got that this is the right answer, classifying emails is the task T. In fact, this definition defines a task T performance measure P and some experience E.
几秒钟后,该视频将暂停。当它暂停时, 您可以使用鼠标选择这四个单选按钮中的一个,让我 知道你所认为问题的答案是这四个中的哪一个。
所以期望你能选到正确的答案。显然任务T在这里指的是邮件分类。事实上,这个定义定义了一个任务T,性能度量P,和一些经验E。
And so, watching you label emails as spam or not spam, this would be the experience E and and the fraction of emails correctly classified, that might be a performance measure P. And so on the task of systems performance, on the performance measure P will improve after the experience E.
In this class, I hope to teach you about various different types of learning algorithms.
所以,机器观看你将有邮件标记为垃圾邮寄或非垃圾邮件,这个将体现为经验E,而邮件分类正确与否的比例,也就是性能度量P,系统性能等任务也是如此,性能P将会随着经验E的提升而提升。本课中,我希望教你有关各种不同类型的 学习算法。
There are several different types of learning algorithms. The main two types are what we call supervised learning and unsupervised learning. I'll define what these terms mean more in the next couple videos.
目前存在几种不同类型的学习算法。 主要的两种类型被我们称之为监督学习和无监督学习。 在接下来的几个视频中,我会给出这些术语的定义。
It turns out that in supervised learning, the idea is we're going to teach the computer how to do something. Whereas in unsupervised learning, we're going to let it learn by itself. Don't worry if these two terms don't make sense yet.
这里简单说两句,监督学习这个想法是指,我们将教 计算机如何去完成任务,而在无监督学习中,我们打算让 它自己进行学习。如果对这两个术语仍一头雾水,请不要担心.
In the next two videos, I'm going to say exactly what these two types of learning are. You might also hear other ghost terms such as reinforcement learning and recommender systems. These are other types of machine learning algorithms that we'll talk about later.
在后面的两个视频中,我会具体介绍这两种学习算法。 此外你将听到诸如,强化学习和推荐系统等 各种术语 这些都是机器学习算法的一员,以后我们都将介绍到
But the two most use types of learning algorithms are probably supervised learning and unsupervised learning. And I'll define them in the next two videos and we'll spend most of this class talking about these two types of learning algorithms. It turns out what are the other things to spend a lot of time on in this class is practical advice for applying learning algorithms.
但学习算法最常用两个类型就是监督学习、无监督学习 我会在接下来的两个视频中给出它们的定义
本课中,我们将花费最多的精力来讨论这两种学习算法 而另一个会花费大量时间的任务是 了解应用学习算法的实用建议。
This is something that I feel pretty strongly about. And exactly something that I don't know if any other university teachers. Teaching about learning algorithms is like giving a set of tools.
我非常注重这部分内容,实际上,就这些内容而言 我不知道还有哪所大学会介绍到。给你讲授学习算法 就好像给你一套工具。
And equally important or more important than giving you the tools as they teach you how to apply these tools. I like to make an analogy to learning to become a carpenter. Imagine that someone is teaching you how to be a carpenter, and they say, here's a hammer, here's a screwdriver, here's a saw, good luck.
与提供工具同等重要,或者可能更重要的 是教你如何使用这些工具。 我喜欢把这比喻成学习当木匠。想象一下, 某人教你如何成为一名木匠,说这是锤子,这是 螺丝刀,锯子,祝你好运。再见。
Well, that's no good. You have all these tools but the more important thing is to learn how to use these tools properly.
There's a huge difference between people that know how to use these machine learning algorithms, versus people that don't know how to use these tools well.
这种教法不好,不是吗? 你拥有这些工具,但更重要的是,你要学会如何恰当地使用 这些工具。会用这些机器学习算法的人,相比与不会用这些工具的人之间,存在着巨大的鸿沟。
Here, in Silicon Valley where I live, when I go visit different companies even at the top Silicon Valley companies, very often I see people trying to apply machine learning algorithms to some problem and sometimes they have been going at for six months. But sometimes when I look at what their doing, I say, I could have told them like, gee, I could have told you six months ago that you should be taking a learning algorithm and applying it in like the slightly modified way and your chance of success will have been much higher. So what we're going to do in this class is actually spend a lot of the time talking about how if you're actually trying to develop a machine learning system, how to make those best practices type decisions about the way in which you build your system.
在硅谷我住的地方,当我走访不同的公司, 即使是最顶尖的公司,很多时候我都看到 人们试图将机器学习算法应用于某些问题 有时他们甚至已经为此花了六个月之久。但当我看着 他们所忙碌的事情时,我想说,哎呀,我本来可以 在六个月前就告诉他们,他们应该采取一种学习算法 稍加修改进行使用,然后成功的机会绝对会高得多 所以在本课中,我们要花很多时间来探讨, 如果你真的试图开发机器学习系统, 探讨如何做出最好的实践类型决策, 来决定什么方式适合构建你的系统。
So that when you're finally learning algorithim, you're less likely to end up one of those people who end up persuing something after six months that someone else could have figured out just a waste of time for six months.
So I'm actually going to spend a lot of time teaching you those sorts of best practices in machine learning and AI and how to get the stuff to work and how the best people do it in Silicon Valley and around the world. I hope to make you one of the best people in knowing how to design and build serious machine learning and AI systems.
这样做的话,当你学完学习算法之后,就不像前面说的一样成为那些为寻找一个解决方案花费6个月之久的人们的中一员。 他们可能已经有了大体的框架,只是没法正确的工作 于是这就浪费了六个月的时间。
所以我会花 很多时间来教你这些机器学习、人工智能的最佳实践 以及如何让它们工作,我们该如何去做,硅谷和世界各地 最优秀的人是怎样做的。我希望能帮你成为懂得设计和构建一个严格的机器学习和AI系统中的最优秀的人才中的一员 .
So that's machine learning, and these are the main topics I hope to teach. In the next video, I'm going to define what is supervised learning and after that what is unsupervised learning. And also time to talk about when you would use each of them.
这就是机器学习,这些都是我希望讲授的主题。在下一个 视频里,我会定义什么是监督学习, 之后是定义什么是无监督学习。此外,探讨何时使用二者。