Ultimately you have to change the shape and size of the pixel and voxel sensors. Even make them movable. Jaggies happen during capture and display, too rigid and exact grids are often counter productive. Event driven cameras seem to show sparse sensor arrays sometimes are much better. Just because cameras are cheap does not mean they are lossless and efficient. Random sampling is not random in its purpose.
Statistical methods are computable, and they can be made “close enough to be sustainable”. Look at all of what is happening and look deep.
27 years of the Internet Foundation taught me “never say never”. Or ” almost nothing is impossible forever” In almost every case it is possible to estimate when something will be possible.
“complex adaptive systems is tiny. Look wider and deeper.
Let your AIs do it from many different perspectives then have them collaborate and compare and tell you what they find, and how to make use of it.
Reverse engineering synthetic data is fun once in a while, but real data is better. The games are different. Do you try to read the mind and motives of a few programmers, or the mind of God or Nature?
Your approach might be more suitable for CERN and big computing.
Actually scientific data on the Internet is actually improving. Depends on what you are following. I follow it all, and it is slow, but overall positive. The real gains will be made with sensor datasets, with synthetics used for lossless experiments to confirm if the algorithms get ever bit exactly right, and to look for efficiencies in algorithms.
You picked a great area to work on, but do not get too dependent on crowd generated “free” resources.
Your examples are all deterministic, in the sense of computable. Are there any where a better method emerges to do the same thing but faster? If so, that says much of our standard approaches might be wasteful and unnecessary. A box model might be as effective and more efficient than a PDE solver based on analytic representations from earlier ages. Many of these are exactly right, but now visualized to fit into narrow human vision intensities and timings.
I am not giving answers, but suggesting look at it different. And apply it to the whole Internet of knowledge, not just a few datasets, however seemingly large now. Try some Exabyte examples too. But with both real and digital twins and other methods. There are man new ones. Not obvious.