37 people have given 303 responses
Statements with the highest number of 👍 or the highest number of 👎 appear at the top (if everyone thinks 👎, that's consensus too)
The claim of "superhuman" forecasting ability is exaggerated.
Given what we currently know it is fair to say that the Safe AI forecasting model performs better than the metaculus crowd forecast.
The system's performance may not translate to consistent profitability in high-liquidity prediction markets.
The search engine date cutoff feature used to prevent information leakage is potentially unreliable.
On balance the Safe AI forecasting model will probably beat the metaculus crowd forecast if given questions 3 months in the future and unable to see the crowd forecast.
People who strongly identify with a particular skill or domain (X) tend to be more pessimistic about AI surpassing human abilities in that domain (X).
It is likely that information was leaking into the questions that Safe AI were testing their model on.
The questions used for evaluation may have been cherry-picked or biased towards the system's strengths.
Nathanpmyoung's market resolution is still unclearly defined.
The decision to use Platt scaling in the evaluation is questionable and lacks sufficient justification.
The evaluation set was all questions that resolved within some amount of time after the cutoff set.