Andon Labs ha provato a testare gli LLM Claude, GPT-5 e Gemini come "menti pensanti" dei robot: fallimento totale. Uno ha citato HAL 9000 prima di spegnersi. Un altro รจ caduto dalle scale. Gli umani hanno vinto 95% contro 40%.
Nel film "L'Uomo Bicentenario" (che voi lettori di FP sembrate conoscere tutti), Andrew impiegรฒ 200 anni per diventare umano. Beh, siamo lontani. Questi LLM hanno impiegato 200 secondi... per perdere la testa.
Andon Labs ha provato a testare gli LLM Claude, GPT-5 e Gemini come "menti pensanti" dei robot: fallimento totale. Uno ha citato HAL 9000 prima di spegnersi. Un altro รจ caduto dalle scale. Gli umani hanno vinto 95% contro 40%.
Nel film "L'Uomo Bicentenario" (che voi lettori di FP sembrate conoscere tutti), Andrew impiegรฒ 200 anni per diventare umano. Beh, siamo lontani. Questi LLM hanno impiegato 200 secondi... per perdere la testa.
"Stop hiring humans. The Era of AI Employees is Here." Billboards across the country are promoting the replacement of millions of jobs with AI and robotics. Great idea. One simple question: How will those displaced workers survive when there are no jobs or income for them?
La fondazione ha rilevato una flessione dell'8% nelle visite a Giungo. La causa: le persone si affidano a strumenti come ChatGPT, Google AI Overviews per avere risposte sintetiche direttamente nella pagina dei risultati.
Si tratta perรฒ di contenuti spesso basati su pagine Wikipedia ma senza fornire il link diretto.
La fondazione ha rilevato una flessione dell'8% nelle visite a Giungo. La causa: le persone si affidano a strumenti come ChatGPT, Google AI Overviews per avere risposte sintetiche direttamente nella pagina dei risultati.
Si tratta perรฒ di contenuti spesso basati su pagine Wikipedia ma senza fornire il link diretto.
L'insegnante di inglese di mia figlia, ha assegnato come compito di scrivere come si immaginano la vita nel futuro. Poi ha chiesto lo stesso a chat GPT (o altro LLM equivalente, non so) ha mischiato i risultati e ha detto alla classe di indovinare qual'era quello scritto dall'LLM. Mia figlia ha detto che รจ stato facilissimo perchรฉ quelli dei bambini contenevano tutti catastrofi varie (guerra, estinzione, robot assassini eccetera), mentre quello artificiale sembrava uno spot pubblicitario (vita felice grazie a tablet veicoli volanti e simili).
L'insegnante di inglese di mia figlia, ha assegnato come compito di scrivere come si immaginano la vita nel futuro. Poi ha chiesto lo stesso a chat GPT (o altro LLM equivalente, non so) ha mischiato i risultati e ha detto alla classe di indovinare qual'era quello scritto dall'LLM. Mia figlia ha detto che รจ stato facilissimo perchรฉ quelli dei bambini contenevano tutti catastrofi varie (guerra, estinzione, robot assassini eccetera), mentre quello artificiale sembrava uno spot pubblicitario (vita felice grazie a tablet veicoli volanti e simili).
AI Hallucination Cases compiled and maintained by Damien Charlotin, a French lawyer and scholar. This database tracks legal decisions1 in cases where generative AI produced hallucinated content โ typically fake citations, but also other types of AI-generated arguments. https://www.damiencharlotin.com/hallucinations/
This is really messed up. Who will respobile if innocent people get punished because of this?
For me it's even worse, because i not only got that in the 90's, i also grew up in the 70's during the oil crisis. Speed limits were lowered to 55 mph to save gas, fuel efficiency standards were first introduced, President Jimmy Carter had solar panels installed on the White House roof and addressed the nation on TV about saving energy while wearing a sweater so the White House thermostat could be set lower in the winter, and all the schools had stickers on the light switches reminding you to turn the lights off when leaving a classroom empty. Energy conservation was huge when i was growing up. Now it's "go ahead and use enough electricity to power a house for a month so you can make a video of a three eyed cat playing a banjo"
For me it's even worse, because i not only got that in the 90's, i also grew up in the 70's during the oil crisis. Speed limits were lowered to 55 mph to save gas, fuel efficiency standards were first introduced, President Jimmy Carter had solar panels installed on the White House roof and addressed the nation on TV about saving energy while wearing a sweater so the White House thermostat could be set lower in the winter, and all the schools had stickers on the light switches reminding you to turn the lights off when leaving a classroom empty. Energy conservation was huge when i was growing up. Now it's "go ahead and use enough electricity to power a house for a month so you can make a video of a three eyed cat playing a banjo"
Pochi giorni fa ho cancellato il mio profilo su academia.edu (condividevo solo la mia tesi di laurea). Che peccato! Ora nessuno potrร piรน ascoltare il podcast generato e narrato dall'"intelligenza" artificiale a cui non avevo acconsentito. ๐ #ai#noAI#academia
Pochi giorni fa ho cancellato il mio profilo su academia.edu (condividevo solo la mia tesi di laurea). Che peccato! Ora nessuno potrร piรน ascoltare il podcast generato e narrato dall'"intelligenza" artificiale a cui non avevo acconsentito. ๐ #ai#noAI#academia
I read @baldur@toot.cafe's post Let's stop pretending that managers and executives care about productivity today, here: https://www.baldurbjarnason.com/2025/disingenuous-discourse/ and felt like riffing a bit on the section "Task sequences as vectors" since I've modeled stuff like this too. As with baldur's post mine is a bit of a gallop, meaning I might make some errors and omissions. The tl;dr is that this model suggests that in many workplaces, mandating AI tool use might have the perverse effect of making the group or workplace less productive overall, even if the tools make individuals more productive (as measured by unit throughput, say).
The rough idea is to model a bunch of people working in a company using queuing theory. Each person receives tasks to perform, completes each task in sequence, and passes the result onto another person. If a person is busy when they receive a new task, the task goes into โฆ
I read @baldur@toot.cafe's post Let's stop pretending that managers and executives care about productivity today, here: https://www.baldurbjarnason.com/2025/disingenuous-discourse/ and felt like riffing a bit on the section "Task sequences as vectors" since I've modeled stuff like this too. As with baldur's post mine is a bit of a gallop, meaning I might make some errors and omissions. The tl;dr is that this model suggests that in many workplaces, mandating AI tool use might have the perverse effect of making the group or workplace less productive overall, even if the tools make individuals more productive (as measured by unit throughput, say).
The rough idea is to model a bunch of people working in a company using queuing theory. Each person receives tasks to perform, completes each task in sequence, and passes the result onto another person. If a person is busy when they receive a new task, the task goes into a sort of inbox to wait till they're ready to work on it (the "queue" in "queuing theory"). Each person is modeled as a probability distribution, where the mean specifies the average or typical amount of time they take to complete a task, and the variance models the fact that sometimes tasks take more or less time to complete for unaccounted for reasons (you spill your coffee; the previous person did a bang up job that time; etc). You can model workplaces with many people, like factories and offices, in this way, and ask questions about how quickly the entire group can complete tasks (throughput, which relates to productivity), how much time passes between an initial incoming request and the output of a final product (latency or wait time), and how much variability there is in the throughput and latency. It's mathmagical!
Anyhow, baldur points out that the variance in individual task completion is the killer variable here. As you introduce more and more variability in the time-to-complete distribution of individuals, the group's throughput and latency suffer significantly. Depending of course on the structure of the group, there can be phase shifts from "throughput decreases" to "throughput effectively stops altogether" as this variability goes up. He argues, I think correctly, that forcing workers to use generative AI tools in their workflows can increase their task completion time variance. Even worse, even if it does make their individual throughput higher--meaning the tools locally "increase productivity"--an increase in the variance of that throughput can make the overall productivity of the workplace lower despite what seem to be individual gains! A manager that actually did care about productivity would at the very least consider this possibility before mandating the use of such tools.
I wanted to add that these sorts of phenomena can be even worse depending on how you model time-to-complete. Often a Gaussian distribution (bell curve) is used, reflecting that sometimes tasks can be completed faster and sometimes they take a bit more time, but tend to an average and do not skew towards faster or slower. This is largely the model baldur was discussing. However, knowledge work, and especially work like coding or R&D, are often better modeled by exponential distributions or similarly long-tailed distributions like the gamma distribution. With knowledge work, most tasks are completed in an average-ish amount of time, occasionally some are completed more quickly, but more often there are tasks that take 2, 3, sometimes 10 or more times as long as the average case. For the exponential distribution, roughly 5% of tasks take an "anomalously" long time.
A sequence of exponentially-distributed tasks has challenging throughput and latency behavior. The sum of independent exponential distributions is a gamma distribution, which is also long-tailed but usually with an even worse rate parameter that tends to lengthen the tail, meaning delays compound (in baldur's post delays tend to be compensated by symmetrical gains if the group is large enough, but that doesn't happen with long-tailed distributions). I don't know enough about queuing theory to say what the general behavior is, but intuitively it seems it must be equally challenging in real-world arrangements. This is one way to account for why software development projects and R&D projects are almost never completed early and can sometimes take 2 or more times longer than anticipated.
Adding variance to an exponential distribution--as mandated use of AI tools might do--has the effect of also increasing the mean time to completion. It also flattens/lengthens the tail. I haven't worked it out for other long-tailed distributions but I suspect similar phenomena with those. Overall this would be going in the wrong direction, making the killer problem--the long tail--even worse!