tensorflow测试脚本|gwozai

1. 安装 Anaconda 或 Miniconda

如果尚未安装，请下载并安装 Anaconda 或 Miniconda（推荐 Miniconda，体积更小）。
下载链接：https://www.anaconda.com/products/distribution 或 https://conda.io/miniconda.html
安装完成后，打开命令行（Windows 的 Command Prompt 或 PowerShell）。

2. 创建并配置 Conda 环境

创建一个名为 tsinghua 的 Python 3.8 环境：
```
conda create -n tsinghua python=3.8
```
激活环境：
```
conda activate tsinghua
```
如果 conda activate 报错 CondaError: Run 'conda init' before 'conda activate'，运行：
```
conda init
```
然后关闭并重新打开命令行，再次运行 conda activate tsinghua。

3. 安装 CUDA Toolkit 和 cuDNN

为了支持 TensorFlow-GPU 2.3.0，安装以下版本：

conda install cudatoolkit=10.1.243
conda install cudnn=7.6.5=cuda10.1_0

验证安装：

conda list | grep cudatoolkit
conda list | grep cudnn

4. 安装 TensorFlow-GPU 2.3.0 及其依赖

安装 TensorFlow-GPU 2.3.0 和相关库，使用清华大学的 PyPI 镜像加速：

pip install tensorflow-gpu==2.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install protobuf==3.20.3 -i https://pypi.tuna.tsinghua.edu.cn/simple  # 解决 protobuf 兼容性问题
pip install numpy==1.18.5 -i https://pypi.tuna.tsinghua.edu.cn/simple  # 推荐与 TensorFlow 2.3.0 兼容的版本
pip install scipy==1.4.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

验证安装：
```
pip list | grep tensorflow
```

5. 优化 CPU 性能（可选）

安装 Intel MKL 优化库以提升 CPU 性能：

conda install mkl
conda install numpy scipy --force-reinstall

6. 检查 NVIDIA 驱动

确保安装了支持 CUDA 10.1 的 NVIDIA 驱动（版本 >= 418.39）：
```
nvidia-smi
```
如果驱动版本过旧，下载并安装最新驱动：https://www.nvidia.com/Download/index.aspx

测试已安装环境

以下是测试脚本，用于验证 tsinghua 环境中 TensorFlow-GPU 2.3.0 是否正常工作，以及 GPU 和 CPU 的可用性。脚本以中文打印日志，便于理解。

1. 保存测试脚本

创建一个名为 test_environment_cn.py 的文件，内容如下：

import tensorflow as tf
import time

# 打印 TensorFlow 版本
print("TensorFlow 版本:", tf.__version__)

# 列出所有可用的物理设备
gpus = tf.config.list_physical_devices('GPU')
cpus = tf.config.list_physical_devices('CPU')

print("可用的 GPU 数量:", len(gpus))
print("可用的 CPU 数量:", len(cpus))

# 如果有 GPU，打印详细信息并测试性能
if gpus:
    for i, gpu in enumerate(gpus):
        print(f"GPU {i} 设备:", gpu)
        print(f"GPU {i} 名称:", tf.test.gpu_device_name(i))

    # 测试一个简单的矩阵乘法任务在 GPU 上运行
    size = 1000  # 矩阵大小，可以根据需要调整
    a = tf.random.uniform((size, size))
    b = tf.random.uniform((size, size))

    # 确保使用 GPU
    with tf.device('/GPU:0'):
        start_time = time.time()
        c = tf.matmul(a, b)
        end_time = time.time()

    print(f"GPU 上的矩阵乘法耗时: {end_time - start_time} 秒")
    print("结果形状:", c.shape)
else:
    print("未检测到 GPU。TensorFlow 将使用 CPU 替代。")

    # 如果没有 GPU，测试 CPU 性能
    with tf.device('/CPU:0'):
        start_time = time.time()
        c = tf.matmul(a, b)
        end_time = time.time()

    print(f"CPU 上的矩阵乘法耗时: {end_time - start_time} 秒")
    print("结果形状:", c.shape)

2. 运行测试脚本

激活 tsinghua 环境：
```
conda activate tsinghua
```
运行脚本：
```
python test_environment_cn.py
```

3. 预期输出

根据您的系统配置（NVIDIA GeForce MX250、TensorFlow-GPU 2.3.0），输出可能类似于：

TensorFlow 版本: 2.3.0
可用的 GPU 数量: 1
可用的 CPU 数量: 1
GPU 0 设备: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
GPU 0 名称: /device:GPU:0
GPU 上的矩阵乘法耗时: 0.1 秒
结果形状: (1000, 1000)

如果 可用的 GPU 数量: 1 且矩阵乘法任务在 GPU 上顺利运行，说明 GPU 可用且正常工作。
如果 可用的 GPU 数量: 0，说明 TensorFlow 没有检测到 GPU，可能是驱动、CUDA 或 cuDNN 配置有问题，需要检查安装步骤。

4. 进一步验证

检查 GPU 驱动和使用情况：
运行：
```
nvidia-smi
```
确保 GPU 正常运行，且没有其他进程占用过多资源。
检查 CPU 性能：
如果需要测试 CPU 性能，可以修改脚本，禁用 GPU：
```
os.environ["CUDA_VISIBLE_DEVICES"] = ""
```
然后重新运行，观察 CPU 上的矩阵乘法耗时。

注意事项

网络问题：如果 pip install 或 conda install 失败，检查网络连接或使用镜像源（如清华或阿里云）。
版本兼容性：确保 cudatoolkit=10.1.243、cudnn=7.6.5 和 tensorflow-gpu=2.3.0 版本匹配。如果有冲突，可能需要重新创建环境并按顺序安装。
性能问题：如果矩阵乘法耗时过长，检查 GPU 内存（1342 MB 有限）或调整矩阵大小。