Quantization is a technique for reducing the precision of the weights and activations in a neural network. This can be done by reducing the number of bits used to represent each weight or activation, or by using a lower precision floating point format. Quantization is a form of network compression, and can be used to reduce the size of the model and improve inference speed.