Visual perception in humans is mediated by a large and complex network of brain regions. Neuroscientists traditionally study this system by examining neural activity elicited by highly controlled, simplified visual inputs, such as gratings, line drawings or cut-out objects. Real-life visual perception however requires rapid processing of a continuous stream of natural scenes, which typically contain a multitude of objects, high-level semantics, and complex temporal structure. In this talk, I will describe a number of studies in which we attempted to quantify natural scenes and link them to brain signals using computational models. In particular, I will discuss three separate research lines dedicated at better understanding the neural computations underlying real-world vision in the human brain: 1) rapid perception of natural scenes, studied with EEG and models of low-level image statistics; 2) the role of high-level semantics and action affordances in natural scene categorization, studied using multi-voxel pattern fMRI and deep neural network models; and 3) temporal dynamics of visual responses, studied with ECoG and temporal encoding models.