Sonarqube project analysis 变慢了

2023-06-06 本文已影响0人 Mokaffee

事情的起因

事情的起因是按捺不住自己内心想 green cost 的冲动，想把sonarqube从GKE的部署转到 Google cloud run 上。

GKE 上存在的 sonarqube

GKE cluster version 1.25.8-gke, location europe-west1-c
Sonarqube version 9.8.0-community
database postgres 9.6 in google cloud sql, location europe-west1-c

部署时，是参照了 sonarqube chart, 改写了 deployment ，添加了 cloud sql auth container, 为 sonarqube 数据库连接所用。

PS：内心在想到底能不能直接用 ip 的方式连接到数据库，这样就可以直接使用 sonarqube chart了. 而不用改写 deployment了。

第一次尝试Cloud run

第一次尝试cloud run 部署 sonarqube 的时候，直接面临的问题就是数据库连接。此时，cloud run 仅支持部署一个 container。算了，又不是不能用，cloud run 不支持 sidecar，就不用cloud run 部署了。
因为对网络方面不了解，下面这些方式不太想尝试。所以就直接放弃了。
https://towardsdatascience.com/how-to-connect-to-gcp-cloud-sql-instances-in-cloud-run-servies-1e60a908e8f2
https://codelabs.developers.google.com/connecting-to-private-cloudsql-from-cloud-run#0

第二次尝试Cloud run

因为cloud run 支持多个container的部署。具体doc： https://cloud.google.com/run/docs/deploying?hl=zh-cn#sidecars

image.png

然后，就开始了为期3天的捉死之路。

周一

参考一些官方文档，整出来了一个 service.yml. 部署在 cloud run location asia-east1.

部署完了之后，感觉放在 cloud run 上， sonarqube 打开比较慢，心里想估计是资源的问题，应该没其他问题。就把 GKE 删除了。

当你觉得没问题的时候，那么问题来了怎么办？
有的小伙伴在提交代码，触发pipeline之后，sonarqube 的quality gate 的值在等待1mins后没有拿到。
然后去看了下 sonarqube 的 background task，发现 project analysis 从之前的十几秒变成了 5 mins左右。

为了不影响其他同事，我新建了 GKE，把服务部署回去.

但是：background task 的 project analysis 依旧是 5 mins左右。

WTF, 是什么情况。

# Copyright 2023 Google LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  annotations: 
     run.googleapis.com/launch-stage: ALPHA
  labels:
    cloud.googleapis.com/location: asia-east1
  name: multicontainer-service
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/execution-environment: gen1 #or gen2
        # Uncomment the following line if connecting to Cloud SQL using Private IP
        # via a VPC access connector
        # run.googleapis.com/vpc-access-connector: <CONNECTOR_NAME>
    spec:
      containers:
      - env:
          - name: SONAR_JDBC_USERNAME
            value: postgres
          - name: SONAR_JDBC_URL
            value: jdbc:postgresql://127.0.0.1:5432/postgres
          - name: SONAR_ES_BOOTSTRAP_CHECKS_DISABLE
            value: 'true'
          - name: SONAR_JDBC_PASSWORD
            valueFrom:
              secretKeyRef:
                key: latest
                name: SONAR_DB_PASSWORD
        image: sonarqube:9.9.1-community
        name: sonarqube
        ports:
        - containerPort: 9000
          name: http1
        resources:
          limits:
            cpu: 1000m
            memory: 2Gi
        startupProbe:
          failureThreshold: 1
          periodSeconds: 240
          tcpSocket:
            port: 9000
          timeoutSeconds: 240
        volumeMounts:
          - mountPath: /opt/sonarqube/data
            name: sonarqube
            subPath: data
          - mountPath: /opt/sonarqube/logs
            name: sonarqube
            subPath: logs
          - mountPath: /opt/sonarqube/extensions
            name: sonarqube
      - name: cloud-sql-proxy
        image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:latest
        args:
             # If connecting to a Cloud SQL instance within a VPC network, you can use the
             # following flag to have the proxy connect over private IP
             # - "--private-ip"

            # Ensure the port number on the --port argument matches the value of the DB_PORT env var on the my-app container.
             - "--port=5432"
             # instance connection name takes format "PROJECT:REGION:INSTANCE_NAME"
             - "<INSTANCE_CONNECTION_NAME>"

周二

上午家里有事，请假了。
下午继续折腾。
明明现在已经重新部署到 GKE，到底是什么原因导致的 project analysis 变慢了呀？
并且 GKE 里部署的 sonarqube 服务会在 project analysis 失败后导致重启。

因为sonarqube 是一个代码分析工具而且还是team内部使用，就一个环境，平时也没太多关注，不知道它在之前的 GKE 里有没有这样的情况，这就导致现在很被动。

目前就发现一些 warn, 除此之外，根本不知道为什么重启，也不知道为什么变慢。

破罐子破摔之后，把sonarqube 9.8 直接部署为 sonarqube 9.9，在捉死的路上越走越远。
升级到 9,9 后，不支持 postgres 9.6 版本，只支持 11- 15，然后把 cloud SQL 升了个级，升级到15.

image.png

晚上的时候，发现 new code 明明没有那么多，但是却现在有很多new code。内心认为是不是数据库被更新坏了，new code 的 baseline 是怎么设计的。

去看了，了解了下new code baseline 相关的问题，并且手动把其中一个 project 的 new code 设置为1天前，这样就没有那么了new code 需要分析。

结果project analysis 也还是 5 mins 左右.
内心很崩溃，也要好好吃饭，好好睡觉。

image.png

周三

早上一来，继续看是什么问题导致的。
既然不清楚原因，那就去读文档。
根据周二的排查，总觉得重启是因为 elastic search 突然退出导致。project analysis 变慢，也是 elastic search导致的，有可能是 memory 不够导致的。
因为部署的时候设置了SONAR_ES_BOOTSTRAP_CHECKS_DISABLE=true, 所以不会检查elastic search的启动。主要还是因为不disable这个check，sonarqube 一直起不来。

https://docs.sonarqube.org/9.9/setup-and-upgrade/configure-and-operate-a-server/environment-variables/#elasticsearch

image.png

看到了sysctl -w vm.max_map_count=262144, 想直接 exec 进入到 pod 里去修改下，结果不行。

$ sysctl -w vm.max_map_count=262144
sysctl: setting key "vm.max_map_count", ignoring: Read-only file system

因为最开始在GKE 上部署 sonarqube 的时候，肯定是没有执行过这个命令的，不排除，升级 sonarqube 版本后需要这个操作，那我该怎么办呢？
这时，想到了 helm 部署 sonarqube，本身我们也是借鉴的 helm chart 的 artifact。

helm repo add sonarqube https://SonarSource.github.io/helm-chart-sonarqube
helm repo update
kubectl create namespace sonarqube
helm upgrade --install -n sonarqube sonarqube sonarqube/sonarqube

对着 sonarqube helm chart 一顿翻找，找了下面关于 elastic search 相关的内容。

image.png

然后打算把 initContainers中设置 elastic search vm.max_map_count 的 container 抽出来和服务一起部署。

image.png

中午了，好好吃饭，好好午休。

吃饭的过程中，想着应该还是可以看到之前 GKE 的日志，可以对比一下日志。
发现：之前 GKE 里也有 memory low 的warn，也有类似 es 错误然后进行了重启。

到底哪里不对了？
旧 GKE 的集群 location 怎么和新集群的 location 不一样？
确实是的，我是想把新 GKE 部署在亚洲的，这样网络方面延迟会好呀。

不管了，把所有的不同都排除掉，重新创建了个 europe-west1的GKE，部署 sonarqube，运行测试+sonar，OK了。

破案了, 我忽略了 sql 的location 是在 europe-west1 !!!， sonarqube 在 asia-east1，就这样，导致project analysis 非常慢，非常大无语事件。

就网上看到的关于 project analysis 慢，其实有个检查清单，但是新旧GKE 的CPU 和memory 都是一样的，没有起到帮助。

总结

新部署的服务需要全方面确认没问题了，在移除旧服务
对于服务，想着减少网络延迟，但是忽略了数据库的location
本身对于 Sonarqube 服务不熟悉

sonarqube 了解到的新知识

sonar.log 使用一个 token，不需要再给 admin password
./gradlew sonar -Dsonar.login="${SONAR_API_TOKEN}"
可以直接用 gradle 的 sonarqube 插件进行 quality gate 的检测
property "sonar.qualitygate.wait", "true"
sonarqube version 和 gradle 的 sonarqube 插件的版本最好对应。

image.png

Sonarqube project analysis 变慢了

事情的起因

GKE 上存在的 sonarqube

第一次尝试Cloud run

第二次尝试Cloud run

周一

为了不影响其他同事，我新建了 GKE，把服务部署回去.

但是：background task 的 project analysis 依旧是 5 mins左右。

WTF, 是什么情况。

周二

周三

总结

sonarqube 了解到的新知识

猜你喜欢

热点阅读