Blog

  • AI Free Agency

    From the Wall Street Journal: Mark Zuckerberg Announces New Meta ‘Superintelligence Labs’ Unit and a partial reorganization of Meta.

    Mark Zuckerberg announced a new “Superintelligence” division within Meta Platforms, officially organizing an effort that has been the subject of an intense recruiting blitz in recent months.

    Former Scale CEO Alexandr Wang will lead the team as chief AI officer, and former GitHub CEO Nat Friedman will lead the company’s work on AI products, according to an internal memo Zuckerberg sent to employees that was viewed by The Wall Street Journal. 

    This after another WSJ article last week about “the list”, designed to ameliorate Meta’s recent disappointing Llama work.

    All over Silicon Valley, the brightest minds in AI are buzzing about “The List,” a compilation of the most talented engineers and researchers in artificial intelligence that Mark Zuckerberg has spent months putting together. 

    Facebooks’ pivot from virtual reality / metaverse (Facebook -> Meta) to AI suggests that the metaverse was the wrong bet. I suspect Zuckerberg knows it, too, but this huge spending spree aligns with Zuck’s ethos, move fast and break things.

    In a world where a really good basketball player (Shai Gilgeous-Alexander) can command $285 million over four years, spending upwards of $100 million per transformative engineer seems like a relative bargain.

  • Maginative: Microsoft’s MAI-DxO Crushes Doctors at Medical Diagnosis while Cutting Costs

    Maginative reports on Microsoft’s new AI Diagnostic Orchestrator and how it outperformed doctors in a recent study. (As an aside, I always wonder about reports that use words like crush in the title. Beware of hyperbole!)

    From the report’s abstract, you’ll find exciting results:

    When paired with OpenAI’s o3 model, MAI-DxO achieves 80% diagnostic accuracy—four times higher than the 20% average of generalist physicians. MAI-DxO also reduces diagnostic costs by 20% compared to physicians, and 70% compared to off-the-shelf o3. 

    A 4x improvement in diagnostic accuracy. This is transformative stuff.

    But when considering the experimental setup:

    Physicians were explicitly instructed not to use external resources, including search engines (e.g., Google, Bing), language models (e.g., ChatGPT, Gemini, Copilot, etc), or other online sources of medical information.

    Now the results don’t seem quite so impressive. In fact, I have a hard time understanding how this report has much utility due to these extreme restrictions that don’t align with real-world practices.

  • If AI Lets Us Do More in Less Time—Why Not Shorten the Workweek?

    It’s a good question for work (particularly for white collar roles) — if workers are more productive because of AI, should the workweek be shorter?

    This question is increasingly central to debates about the future of work and closely tied to the growing interest in the four-day workweek. According to Convictional CEO Roger Kirkness, his team was able to shift to a 32-hour schedule without any pay cuts—thanks to AI. As he told his staff, “Fridays are now considered days off.” The reaction was enthusiastic. “Oh my God, I was so happy,” said engineer Nick Wechner, who noted how much more quickly he could work using AI tools.

    Aside from his contention for boss of the year award, Kirkness recognizes the key criteria for success: getting your work done. If the work can be done faster, companies can choose: (1) reduce the total number of hours worked per employee (without reducing headcount); (2) reduce headcount by a commensurate number (in Convictional’s case, 20%); (3) grow the company to do more work with a similar number of employees.

    As a worker, I’m sympathetic to the idea of shorter work weeks, but I suspect that growth is a more realistic option. Employees continue to work similar hours, but increased productivity leads to company growth (but not headcount growth).

  • Microsoft Releases Copilot Extension for VS Code

    From Microsoft:

    GitHub Copilot is an AI peer programming tool that helps you write code faster and smarter.

    GitHub Copilot adapts to your unique needs allowing you to select the best model for your project, customize chat responses with custom instructions, and utilize agent mode for AI-powered, seamlessly integrated peer programming sessions.

    Simon Willison reports, “So far this is just the extension that provides the chat component of Copilot, but the launch announcement promises that Copilot autocomplete will be coming in the near future.”

    I’ve been pessimistic about Copilot, including a post earlier today about Copilot’s misleading advertising. But we’ve seen Anthropic make impressive strides with their programming tools, so perhaps Microsoft is taking steps to make a more useful agent.

  • Bloomberg: Apple Weighs Using Anthropic or OpenAI to Power Siri in Major Reversal

    Mark Gurman reports (paywall) that Apple is considering using OpenAI or Anthropic to power Siri.

    Maginative has a little more on Apple’s failures with AI:

    This isn’t just about technology. It’s about Apple essentially admitting it can’t keep up in the most important tech race in decades.

    The backstory makes this even more dramatic. Apple originally promised enhanced Siri capabilities in 2024, then delayed them to 2025, and finally pushed them indefinitely to 2026. Some within Apple’s AI division believe the features could be scrapped altogether and rebuilt from scratch.

    I have a lot of Apple products, and I find Siri’s utility to be limited to things like “play the song, Back in Black” or “call my wife.” And the Apple Intelligence presentation from WWDC 2024 remains a black eye for the company, so I welcome this news as a helpful recognition of Apple’s position in the AI race as well as a way to make their products more useful for end users.

  • TechCrunch: Congress might block state AI laws for five years

    Senators Ted Cruz and Marsha Blackburn include a measure to limit (most) state oversight of AI laws for the next five years as part of the “Big Beautiful Bill” currently in the works. Critics (and the Senate Parliamentarian) have reduced the scope and duration of the provision to modify the measure.

    However, over the weekend, Cruz and Sen. Marsha Blackburn (R-TN), who has also criticized the bill, agreed to shorten the pause on state-based AI regulation to five years. The new language also attempts to exempt laws addressing child sexual abuse materials, children’s online safety, and an individual’s rights to their name, likeness, voice, and image. However, the amendment says the laws must not place an “undue or disproportionate burden” on AI systems — legal experts are unsure how this would impact state AI laws.

    The regulation is supported by some in the tech industry, including OpenAI CEO Sam Altman, whereas Anthropic’s leadership is opposed.

    I’m sympathetic to the aims of this bill as a patchwork of 50 state laws regulating AI would make it more difficult to innovate in this space. But I’m also aware of real-life harm (as a recent NY Times story profiled), so I’d be much more sanguine if we had federal-level regulation, a prospect that seems very unlikely considering the current political makeup.

  • The Verge: Microsoft should change its Copilot advertising, says watchdog

    The BBB critiques Microsoft’s recent advertising for Clippy, I mean Copilot, and they found quite a bit of puffery.

    From The Verge:

    Microsoft has been claiming that Copilot has productivity and return on investment (ROI) benefits for businesses that adopt the AI assistant, including that “67%, 70%, and 75% of users say they are more productive” after a certain amount of Copilot usage. “NAD found that although the study demonstrates a perception of productivity, it does not provide a good fit for the objective claim at issue,” says the watchdog in its review. “As a result, NAD recommended the claim be discontinued or modified to disclose the basis for the claim.”

    And from the original report from the BBB National Programs’ National Advertising Division:

    NAD found that although the study demonstrates a perception of productivity, it does not provide a good fit for the objective claim at issue. As a result, NAD recommended the claim be discontinued or modified to disclose the basis for the claim. 

    Aside from puffery, this aligns with my observations of Copilot. The branding is confusing, the integration with products is suspect, and the tools lags far behind other AI/LLM agents like Gemini, ChatGPT, and Claude.

  • Checking In on AI and the Big Five

    Ben Thompson writes on the Big 5 (Amazon, Apple, Google, Meta/Facebook, Microsoft) and where they stand in the AI field today.

    … [is] AI complementary to existing business models (i.e. Apple devices are better with AI) or disruptive to them (i.e. AI might be better than Search but monetize worse). A higher level question, however, is if AI simply obsoletes everything, from tech business models to all white collar work to work generally or even to life itself.

    Perhaps it is the smallness of my imagination or my appreciation of the human condition that makes me more optimistic than many about the probability of the most dire of predictions: I think they are quite low. At the same time, I think that those dismissing AI as nothing but hype are missing the boat as well. This is a big deal, even if the changes may end up fitting into the Bill Gates maxim that “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.”

    I tend to agree with Thompson’s predictions — change over the next decade will be significant (and hard to imagine now) and the likelihood of the dire predictions coming true is astonishingly low in the near term.

    Like Thompson, I assumed that Microsoft’s partnership with OpenAI would position them to lap the other companies listed here, but the Copilot product is persistently disappointing, especially when considering ChatGPT’s rising utility. Google Gemini, as a tool, is gaining capabilities, particularly as it relates to Veo and programming, although I think the Gemini-infused Google search results have too many embarrassing mistakes for it to be a useful tool today.

  • Elio, Original Films and Streaming

    John Gruber writes on Pixar’s latest film dud, Elio. He mentions the general lack of awareness around the film, noting it as a marketing problem in addition to the company’s string of unimpressive offerings.

    My sense is that it’s not a marketing problem. My kids have a pile of Elio toys from recent trips to McDonald’s. We have a fair amount of kids programming streaming on any one day, and we were quite informed the movie was to be released this month. But that hasn’t persuaded us to see the movie (yet, at least), a trend easily visible in Elio‘s box office numbers. The movie has good reviews on Rotten Tomatoes (83% as of this post), so it doesn’t appear to be an outright dud of a story.

    Last summer’s Inside Out 2 was a decent film, although it didn’t have the same magic as the original (or many of Pixar’s earlier successes). Elemental was okay, but Lightyear and Turning Red were bad if not downright terrible. Coming-of-age tales like Turning Red are a common trope in film, but I can’t imagine many parents endorsing a movie that celebrates an outright disregard for parental instruction (and one that preceded the destruction of a large city). 

    So Pixar lost its way by producing mediocre films. Audiences no longer believe that Pixar films are creative and entertaining in the ways they were 20+ years ago. But there’s another shift: streaming.

    For parents (a big driver of Pixar movie traffic, I’d assume), they’re no longer in a position of deciding whether or not they’ll watch a Disney film. The question is when and where you’ll watch a Disney film. If it looks to be a good one, then perhaps an outing to the theater is in order. If it’s good to mediocre or even terrible, then the decision is an easy one—wait for it to come to Disney+. And back to the marketing thesis — Lilo and Stitch has done very well in the theaters this year, and I suspect that this has sapped family movie budgets for now…another strong reason to wait to see Elio on Disney+.

    (As an aside, I assume this is the same problem that Marvel releases have had lately. And I suspect that studios’ accounting has shifted to account for subscriber count in funding prestige films like this. You can’t simply have sequels when sustaining the Disney entertainment empire.)

  • Resisting AI?

    Dan McQuillan writes, The role of the University is to resist AI,following themes from Ivan Illich’s ‘Tools for Conviviality’.

    It’s a scathing overview with points that I think many others wonder about (although in less concrete ways than McQuillan).

    Contemporary AI is a specific mode of connectionist computation based on neural networks and transformer models. AI is also a tool in Illich’s sense; at the same time, an arrangement of institutions, investments and claims. One benefit of listening to industry podcasts, as I do, is the openness of the engineers when they admit that no-one really knows what’s going on inside these models.

    Let that sink in for a moment: we’re in the midst of a giant social experiment that pivots around a technology whose inner workings are unpredictable and opaque.

    The highlight is mine. I agree that there’s something disconcerting about using systems that we don’t understand fully.

    Generative AI’s main impact on higher education has been to cause panic about students cheating, a panic that diverts attention from the already immiserated experience of marketised studenthood. It’s also caused increasing alarm about staff cheating, via AI marking and feedback, which again diverts attention from their experience of relentless and ongoing precaritisation.

    The hegemonic narrative calls for universities to embrace these tools as a way to revitalise pedagogy, and because students will need AI skills in the world of work. A major flaw with this story is that the tools don’t actually work, or at least not as claimed.

    AI summarisation doesn’t summarise; it simulates a summary based on the learned parameters of its model. AI research tools don’t research; they shove a lot of searched-up docs into the chatbot context in the hope that will trigger relevancy. For their part, so-called reasoning models ramp up inference costs while confabulating a chain of thought to cover up their glaring limitations.

    I think there are philosophical questions here worth considering. Specifically, the postulation that AI simply “simulates” is too simple and not helpful. What is a photograph? It’s a real thing, but not the real thing captured on the image. What is a video played on a computer screen? It’s a real thing, but it’s not the real thing. The photo and screen simulate the real world, but I’m not aware of modern philosophers critiquing these forms of media. (I’d suspect that earlier media theorists did just that until the media was accepted en masse by society.)

    He goes on to cite environmental concerns (although as I posted recently, the questions of water consumption are exaggerated) among things we’re well suited to take heed of. His language is perhaps a bit too revolutionary.

    As for people’s councils — I am less sanguine that these have much utility.

    Instead of waiting for a liberal rules-based order to magically appear, we need to find other ways to organise to put convivial constraints into practice. I suggest that a workers’ or people’s council on AI can be constituted in any context to carry out the kinds of technosocial inquiry advocated for by Illich, that the act of doing so prefigures the very forms of independent thought which are undermined by AI’s apparatus, and manifests the kind of careful, contextual and relational approach that is erased by AI’s normative scaling.

    I suspect that people’s councils are glorified committees — structures that are kabuki theater than anything else and will struggle to align with the speed at which AI tools are emerging.

    The role of the university isn’t to roll over in the face of tall tales about technological inevitability, but to model the forms of critical pedagogy that underpin the social defence against authoritarianism and which makes space to reimagine the other worlds that are still possible.

    I don’t share all of his fears, but it’s important to consider voices that may not align with a techno-optimistic future.