UPDATE (Nov 26th, 3:30pm ET): A few readers have mentioned that FPGAs take much less than hours to reprogram. I even received an email last night that claims FPGAs can be reprogrammed in "well under a second." This differs from the sources I've read when I was reading up on their OpenCL capabilities (for potential evolutions of projects) back in ~2013. That said, multiple sources, including one who claim to have personal experience with FPGAs, say that it's not the case. Also, I've never used an FPGA myself — again, I was just researching them to see where some GPU-based projects could go.
Designing integrated circuits, as I've said a few times, is basically a game. You have a blank canvas that you can etch complexity into. The amount of “complexity” depends on your fabrication process, how big your chip is, the intended power, and so forth. Performance depends on how you use the complexity to compute actual tasks. If you know something special about your workload, you can optimize your circuit to do more with less. CPUs are designed to do basically anything, while GPUs assume similar tasks can be run together. If you will only ever run a single program, you can even bake some or all of its source code into hardware called an “application-specific integrated circuit” (ASIC), which is often used for video decoding, rasterizing geometry, and so forth.
This is an old Atom back when Intel was partnered with Altera for custom chips.
FPGAs are circuits that can be baked into a specific application, but can also be reprogrammed later. Changing tasks requires a significant amount of time (sometimes hours) but it is easier than reconfiguring an ASIC, which involves removing it from your system, throwing it in the trash, and printing a new one. FPGAs are not quite as efficient as a dedicated ASIC, but it's about as close as you can get without translating the actual source code directly into a circuit.
Intel, after purchasing FPGA manufacturer, Altera, will integrate their technology into Xeons in Q1 2016. This will be useful to offload specific tasks that dominate a server's total workload. According to PC World, they will be integrated as a two-chip package, where both the CPU and FPGA can access the same cache. I'm not sure what form of heterogeneous memory architecture that Intel is using, but this would be a great example of a part that could benefit from in-place acceleration. You could imagine a simple function being baked into the FPGA to, I don't know, process large videos in very specific ways without expensive copies.
Again, this is not a consumer product, and may never be. Reprogramming an FPGA can take hours, and I can't think of too many situations where consumers will trade off hours of time to switch tasks with high performance. Then again, it just takes one person to think of a great application for it to take off.