My mental model of how TLB and cache are organized and used in modern processors:
Binary code in programs of modern computers only use virtual addresses. Addresses are necessary to load instructions (for example, at the address pointed to by the program counter (PC)) or load data (for example, at an offset from the base address of an array in the program). Address translation is the process of converting a virtual address in the process's virtual memory to a physical address in main memory (RAM).
Page tables hold the mappings from virtual page numbers to page frame numbers. Thus they can be used to map virtual addresses to physical addresses. Page tables can be large and are themselves organized as a tree with multiple levels, where each node is a page and the leaf nodes hold the mappings. Page tables are themselves stored in main memory or might need to be demand-paged into main memory from disk.
Since accessing main memory for address translation is slow, the translation lookaside buffer (TLB) is used as a cache for address translation. Similarly, since accessing main memory to load instructions or data is slow, the cache (as it is generally called) is used. In modern processors, both TLBs and caches might have multiple levels and might be split for instruction and data.
As an example, a modern processor might have a 3-level cache hierarchy. There might be separate L1 instruction cache (i-cache) and L1 data cache (d-cache) per core. And then larger L2 and L3 caches, which hold both instructions and data, shared across all the cores.
What is less well known is that the TLB might also be split up in modern processors. For example, a L1 instruction TLB and a L1 data TLB, followed by a L2 TLB that handles both instruction and data addresses. Another design might just have a L1 TLB and L2 TLB, both of them handling both instruction and data addresses.
Keeping the above in mind, here is how such a modern processor might read instructions or data from main memory: