fetch_california_housing报错:urlli

2023-03-03  本文已影响0人  LabVIEW_Python

问题描述:从sklearn中导入加州房价数据集:

from sklearn.datasets import fetch_california_housing, get_data_home
import numpy as np

print(get_data_home())
features, labels= fetch_california_housing(return_X_y=True)

print(features.shape, labels.shape)

报错如下:

urllib.error.HTTPError: HTTP Error 403: Forbidden

报错

解决方案
打开...\site-packages\sklearn\datasets_california_housing.py文件,在Line42可以获得数据集的链接:

# The original data can be found at:
# https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz

手动下载该数据集,并放在get_data_home()返回的文件夹里面

from sklearn.datasets import fetch_california_housing, get_data_home
print(get_data_home())

最后,修改_california_housing.py line154

        #cal_housing = joblib.load(filepath)
        with tarfile.open(mode="r:gz", name=filepath) as f:
            cal_housing = np.loadtxt(
                f.extractfile("CaliforniaHousing/cal_housing.data"), delimiter=","
            )
            # Columns are not in the same order compared to the previous
            # URL resource on lib.stat.cmu.edu
            columns_index = [8, 7, 2, 3, 4, 5, 6, 1, 0]
            cal_housing = cal_housing[:, columns_index]

然后运行:

from sklearn.datasets import fetch_california_housing, get_data_home
import numpy as np

print(get_data_home())
features, labels= fetch_california_housing(return_X_y=True)

print(features.shape, labels.shape)
print(features[0])
print(labels[0])

运行结果如下:

(20640, 8) (20640,)
[ 8.3252 41. 6.98412698 1.02380952 322.
2.55555556 37.88 -122.23 ]
4.526

加州房价数据集
上一篇 下一篇

猜你喜欢

热点阅读