Low bitwidth integer arithmetic has been widely adopted in hardware implementations of deep neural network inference applications. However, despite the promised energy-efficiency improvements demanding edge applications, the use of low bitwidth integer arithmetic for neural network training remains limited. Unlike inference, training demands high dynamic range and numerical accuracy for high quality results, making the use of low-bitwidth integer arithmetic particularly challenging. To address this challenge, we present a novel neural network training framework called NITI that exclusively utilizes low bitwidth integer arithmetic. NITI stores all parameters and accumulates intermediate values as 8-bit integers while using no more than 5 bits for gradients. To provide the necessary dynamic range during the training process, a per-layer block scaling exponentiation scheme is utilized. By deeply integrating with the rounding procedures and integer entropy loss calculation, the proposed scaling scheme incurs only minimal overhead in terms of storage and additional computation. Furthermore, a hardware-efficient pseudo-stochastic rounding scheme that eliminates the need for external random number generation is proposed to facilitate conversion from wider intermediate arithmetic results to lower precision for storage. Since NITI operates only with standard 8-bit integer arithmetic and storage, it is possible to accelerate it using existing low bitwidth operators originally developed for inference in commodity accelerators. To demonstrate this, an open-source software implementation of end-to-end training, using native 8-bit integer operations in modern GPUs is presented. In addition, experiments have been conducted on an FPGA-based training accelerator to evaluate the hardware advantage of NITI. When compared with an equivalent training setup implemented with floating point storage and arithmetic, NITI has no accuracy degradation on the MNIST and CIFAR10 datasets. On ImageNet, NITI achieves similar accuracy as state-of-the-art integer training frameworks without relying on full-precision floating-point first and last layers.