202309290900
#地球online#,202309290854 。查看了一些大模型背后的数据采集源。虽然经过了很多优化处理,但是明显看到,很大一部分信息源,是来自各大模型公司自身平台之前积累的公众数据。最糟糕的地方在于,这些公众数据的质量非常低劣,尤其是在专业性问答方面,都是错漏百出或者档次低下的信息流时代积累下来的。虽然经过了一些信度效度提纯比对处理。但毕竟用语语法和严谨性不高。这样的信息改头换面,重新成为大模型信息流的原始数据,可想而知这些大模型基础底层信息的质量了。只能说,高品质大数据量的回答,可望而不可及。# Earth online , 202309290854. I looked at the data collection sources behind some of the big models. Although a lot of optimization has been done, it is obvious that a large part of the information source comes from the public data accumulated by the major model companies before their own platforms. The worst part is that the quality of these public data is very poor, especially in professional questions and answers, which are accumulated in the era of information flow full of mistakes or low grade. Although after some reliability and validity purification and comparison processing. But after all, the grammar and rigor of the language are not high. This kind of information has changed its appearance and become the original data of the information flow of the big model again, so we can imagine the quality of the underlying information of these big models. It can only be said that the answer of high quality and large amount of data is beyond reach.
http://yunzhanshi.business.blog/2023/09/29/202309290854/