39 people have given 316 responses
The claim of "superhuman" forecasting ability is exaggerated.
Given what we currently know it is fair to say that the Safe AI forecasting model performs better than the metaculus crowd forecast.
The system's performance may not translate to consistent profitability in high-liquidity prediction markets.
On balance the Safe AI forecasting model will probably beat the metaculus crowd forecast if given questions 3 months in the future and unable to see the crowd forecast.
The search engine date cutoff feature used to prevent information leakage is potentially unreliable.
It is likely that information was leaking into the questions that Safe AI were testing their model on.
People who strongly identify with a particular skill or domain (X) tend to be more pessimistic about AI surpassing human abilities in that domain (X).
The questions used for evaluation may have been cherry-picked or biased towards the system's strengths.
Nathanpmyoung's market resolution is still unclearly defined.
The decision to use Platt scaling in the evaluation is questionable and lacks sufficient justification.
The evaluation set was all questions that resolved within some amount of time after the cutoff set.