Setting up the data and the model | Michael Zhang

Linux stack overflow ubuntu登录远程服务器（包含google cloud） Basic knowledge of Linux ubuntu16.04连接AWS EC2 Ubuntu16.04 安装node.js并配置

docker一般使用 Training With Your Own Dataset on Caffe Anaconda多环境多python版本配置 Github简单使用 Atom简单使用

ubuntu配置samba Visual Studio Code as Latex Editor on Ubuntu 16.04 Install dlib with conda Install dlib and Create C++ project through VS 2017 ubuntu16.04手动安装英伟达最新版显卡驱动 ubuntu16.04从源码编译安装opencv3 ubuntu16.04给pip换源软件安装 ubuntu16.04安装gpu版tensorflow＋keras

python3 函数的参数图片的简单操作 Python3 杂记 pandas数据结构 Python画图工具matplotlib简单使用 IO操作 Python Data Structures

TensorFlow Pytorch 图像检索系统训练caffe模型的一般方法 Input pipeline in TensorFlow A complete guide to using Keras as part of a TensorFlow workflow Objective function in Neural Network CS231n Winter 2016 Lecture 8 Localization and Detection Segmentation MobileNetV2论文笔记 Hello World to ML

A better way to talk about love

C++风格指南 linux下C++基础操作 C++ Data Structures C++基础

Setting up the data and the model

Linux 5

Linux stack overflow ubuntu登录远程服务器（包含google cloud） Basic knowledge of Linux ubuntu16.04连接AWS EC2 Ubuntu16.04 安装node.js并配置

工具使用 5

docker一般使用 Training With Your Own Dataset on Caffe Anaconda多环境多python版本配置 Github简单使用 Atom简单使用

环境搭建 9

ubuntu配置samba Visual Studio Code as Latex Editor on Ubuntu 16.04 Install dlib with conda Install dlib and Create C++ project through VS 2017 ubuntu16.04手动安装英伟达最新版显卡驱动 ubuntu16.04从源码编译安装opencv3 ubuntu16.04给pip换源软件安装 ubuntu16.04安装gpu版tensorflow＋keras

Python3 6

python3 函数的参数图片的简单操作 Python3 杂记 Python画图工具matplotlib简单使用 IO操作 Python Data Structures

DA 11

TensorFlow Pytorch 图像检索系统训练caffe模型的一般方法 Input pipeline in TensorFlow A complete guide to using Keras as part of a TensorFlow workflow Objective function in Neural Network CS231n Winter 2016 Lecture 8 Localization and Detection Segmentation MobileNetV2论文笔记 Hello World to ML

EnglishLearning 1

A better way to talk about love

pandas 1

pandas数据结构

C++ 4

C++风格指南 linux下C++基础操作 C++ Data Structures C++基础

CS231n-Convolutional-Neural-Networks-for-Visual-Recognition 1

Setting up the data and the model

Setting up the data and the model

2018-06-22

Data Preprocessing

Mean subtraction

It involves subtracting the mean across every individual feature in the data, and has the geometric interpretation(几何解释) of centering the cloud of data around the origin alone every dimension.(X = X - np.mean(X, axis = 0))

With images specifically, for convenience it can be common to substract a single value from all pixels (e.g. X -= np.mean(X)), or to do so separately across the three color channels.

Normalization

Normalization refer to normalizing the data dimensions so that they are of approximately the same scale.

Two common ways:

One is to divide each dimension by its standard deviation, once it has been zero-centered:(X /= np.std(X, axis = 0)).
Another form of this preprocessing normalizes each dimension so that the min and max along the dimensions is -1 and 1 respectively.

In case of images, the relative scales of pixels are already approximately equal (and in range from 0 to 255), so it is not strictly necessary to perform this additional preprocessing step.

Common data preprocessing pipeline

PCA and Whitening

This is another form of preprocessing. In this process, the data is first centered as description above. Then we compute the covariance(协方差) matrix that tells us about the correlation structure in the data.

# Assume input data matrix X of size [N x D], where N is the number of data and D is their dimensionality.

X -= np.mean(X, axis = 0) # zero-center the data (important)
cov = np.dot(X.T, X) / X.shape[0] # get the data covariance matrix

The (i, j) element of the data covariance matrix contains the covariance between i-th and j-th dimension of the data.

未完待续…