Enough AI copilots, we need AI HUDs
957 by walterbell | 265 comments on Hacker News.
Wednesday, 30 July 2025
Tuesday, 29 July 2025
Monday, 28 July 2025
New best story on Hacker News: Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork
Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork
819 by segfault22 | 297 comments on Hacker News.
Hi HN, I was evaluating IDEs for a personal project and decided to test Trae, ByteDance's fork of VSCode. I immediately noticed some significant performance and privacy issues that I felt were worth sharing. I've written up a full analysis with screenshots, network logs, and data payloads in the linked post. Here are the key findings: 1. Extreme Resource Consumption: Out of the box, Trae used 6.3x more RAM (~5.7 GB) and spawned 3.7x more processes (33 total) than a standard VSCode setup with the same project open. The team has since made improvements, but it's still significantly heavier. 2. Telemetry Opt-Out Doesn't Work (It Makes It Worse): I found Trae was constantly sending data to ByteDance servers (byteoversea.com). I went into the settings and disabled all telemetry. To my surprise, this didn't stop the traffic. In fact, it increased the frequency of batch data collection. The telemetry "off" switch appears to be purely cosmetic. 3. What's Being Sent: Even with telemetry "disabled," Trae sends detailed payloads including: Hardware specs (CPU, memory, etc.) Persistent user, device, and machine IDs OS version, app language, user name Granular usage data like time-on-ide, window focus state, and active file types. 4. Community Censorship: When I tried to discuss these findings on their official Discord, my posts were deleted and my account was muted for 7 days. It seems words like "track" trigger an automated gag rule, which prevents any real discussion about privacy. I believe developers should be aware of this behavior. The combination of resource drain, non-functional privacy settings, and censorship of technical feedback is a major red flag. The full, detailed analysis with all the evidence (process lists, Fiddler captures, JSON payloads, and screenshots of the Discord moderation) is available at the link. Happy to answer any questions.
819 by segfault22 | 297 comments on Hacker News.
Hi HN, I was evaluating IDEs for a personal project and decided to test Trae, ByteDance's fork of VSCode. I immediately noticed some significant performance and privacy issues that I felt were worth sharing. I've written up a full analysis with screenshots, network logs, and data payloads in the linked post. Here are the key findings: 1. Extreme Resource Consumption: Out of the box, Trae used 6.3x more RAM (~5.7 GB) and spawned 3.7x more processes (33 total) than a standard VSCode setup with the same project open. The team has since made improvements, but it's still significantly heavier. 2. Telemetry Opt-Out Doesn't Work (It Makes It Worse): I found Trae was constantly sending data to ByteDance servers (byteoversea.com). I went into the settings and disabled all telemetry. To my surprise, this didn't stop the traffic. In fact, it increased the frequency of batch data collection. The telemetry "off" switch appears to be purely cosmetic. 3. What's Being Sent: Even with telemetry "disabled," Trae sends detailed payloads including: Hardware specs (CPU, memory, etc.) Persistent user, device, and machine IDs OS version, app language, user name Granular usage data like time-on-ide, window focus state, and active file types. 4. Community Censorship: When I tried to discuss these findings on their official Discord, my posts were deleted and my account was muted for 7 days. It seems words like "track" trigger an automated gag rule, which prevents any real discussion about privacy. I believe developers should be aware of this behavior. The combination of resource drain, non-functional privacy settings, and censorship of technical feedback is a major red flag. The full, detailed analysis with all the evidence (process lists, Fiddler captures, JSON payloads, and screenshots of the Discord moderation) is available at the link. Happy to answer any questions.
Saturday, 26 July 2025
Thursday, 24 July 2025
New best story on Hacker News: CARA – High precision robot dog using rope
CARA – High precision robot dog using rope
714 by hakonjdjohnsen | 119 comments on Hacker News.
https://www.youtube.com/watch?v=8s9TjRz01fo
714 by hakonjdjohnsen | 119 comments on Hacker News.
https://www.youtube.com/watch?v=8s9TjRz01fo
Wednesday, 23 July 2025
Tuesday, 22 July 2025
New best story on Hacker News: Global hack on Microsoft Sharepoint hits U.S., state agencies, researchers say
Global hack on Microsoft Sharepoint hits U.S., state agencies, researchers say
616 by spenvo | 291 comments on Hacker News.
https://ift.tt/cAXeqFT , https://ift.tt/ua2KN5v... https://ift.tt/4lECsvq https://ift.tt/073LnCA... https://ift.tt/SEK03wU...
616 by spenvo | 291 comments on Hacker News.
https://ift.tt/cAXeqFT , https://ift.tt/ua2KN5v... https://ift.tt/4lECsvq https://ift.tt/073LnCA... https://ift.tt/SEK03wU...
Monday, 21 July 2025
Sunday, 20 July 2025
Thursday, 17 July 2025
Tuesday, 15 July 2025
Monday, 14 July 2025
New best story on Hacker News: Show HN: Ten years of running every day, visualized
Show HN: Ten years of running every day, visualized
839 by friggeri | 427 comments on Hacker News.
Today marks ten years, 3653 consecutive days, of running at least one mile every day under the USRSA rules [1]. To celebrate, I built an interactive dashboard that turns a decade of GPX files into charts you can explore. Running has truly changed my life: I've made lifelong friends, explored beautiful places, and more importantly invested into my own health and fitness, which I'm starting to see the positive benefits as I get older. The stack is pretty simple: a NextJS app, with a Postgres database to keep all my running data, and all the stats are pre-computed and cached in Redis, so I effectively only hit the database once a day when a new run is ingested. On the fronted, I toyed with the idea of using D3 or pre-existing data viz libraries, but ended up rolling my own using SVGs directly, it gave me more control on the visualizations. I used the Strava bulk export to pre-populate the database, and I'm using their webhook API to do incremental updates. I have to tap into OpenWeatherMap and OpenCageDate to enrich the running data a little bit. Happy to answer anything about the stack, data pipeline, or how I stayed motivated for 10 years! [1] https://ift.tt/G4TB3Qu Run Streak Association rules: ≥ 1 mile per day
839 by friggeri | 427 comments on Hacker News.
Today marks ten years, 3653 consecutive days, of running at least one mile every day under the USRSA rules [1]. To celebrate, I built an interactive dashboard that turns a decade of GPX files into charts you can explore. Running has truly changed my life: I've made lifelong friends, explored beautiful places, and more importantly invested into my own health and fitness, which I'm starting to see the positive benefits as I get older. The stack is pretty simple: a NextJS app, with a Postgres database to keep all my running data, and all the stats are pre-computed and cached in Redis, so I effectively only hit the database once a day when a new run is ingested. On the fronted, I toyed with the idea of using D3 or pre-existing data viz libraries, but ended up rolling my own using SVGs directly, it gave me more control on the visualizations. I used the Strava bulk export to pre-populate the database, and I'm using their webhook API to do incremental updates. I have to tap into OpenWeatherMap and OpenCageDate to enrich the running data a little bit. Happy to answer anything about the stack, data pipeline, or how I stayed motivated for 10 years! [1] https://ift.tt/G4TB3Qu Run Streak Association rules: ≥ 1 mile per day
Sunday, 13 July 2025
Saturday, 12 July 2025
Wednesday, 9 July 2025
Monday, 7 July 2025
Sunday, 6 July 2025
Saturday, 5 July 2025
Friday, 4 July 2025
Thursday, 3 July 2025
Wednesday, 2 July 2025
Tuesday, 1 July 2025
New best story on Hacker News: The new skill in AI is not prompting, it's context engineering
The new skill in AI is not prompting, it's context engineering
658 by robotswantdata | 352 comments on Hacker News.
658 by robotswantdata | 352 comments on Hacker News.
Sunday, 29 June 2025
Saturday, 28 June 2025
Friday, 27 June 2025
New best story on Hacker News: Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights
Show HN: I'm an airline pilot – I built interactive graphs/globes of my flights
713 by jamesharding | 127 comments on Hacker News.
Hey HN! Pilots everywhere are required to keep a logbook of all their flying hours, aircraft, airports, and so on. Since I track everything digitally (some people still just use paper logbooks!), I put together some data visualizations and a few 3D globes to show my flying history. This globe is probably my favourite so far: https://ift.tt/gufFdza If you’ve got ideas for other graphs or ways to show this kind of data, I’d love to hear them!
713 by jamesharding | 127 comments on Hacker News.
Hey HN! Pilots everywhere are required to keep a logbook of all their flying hours, aircraft, airports, and so on. Since I track everything digitally (some people still just use paper logbooks!), I put together some data visualizations and a few 3D globes to show my flying history. This globe is probably my favourite so far: https://ift.tt/gufFdza If you’ve got ideas for other graphs or ways to show this kind of data, I’d love to hear them!
Thursday, 26 June 2025
Wednesday, 25 June 2025
Tuesday, 24 June 2025
Sunday, 22 June 2025
Saturday, 21 June 2025
Thursday, 19 June 2025
Wednesday, 18 June 2025
Tuesday, 17 June 2025
Saturday, 14 June 2025
Thursday, 12 June 2025
Wednesday, 11 June 2025
Tuesday, 10 June 2025
Monday, 9 June 2025
New best story on Hacker News: Tell HN: Help restore the tax deduction for software dev in the US (Section 174)
Tell HN: Help restore the tax deduction for software dev in the US (Section 174)
821 by dang | 336 comments on Hacker News.
Companies building software in the US were hit hard a few years ago when the tax code stopped allowing deduction of software dev expenses. Now they have to be amortized over several years. HN has had many discussions about this, including The time bomb in the tax code that's fueling mass tech layoffs - https://ift.tt/hoqVlnK - (927 comments) a few days ago. Other threads are listed at https://ift.tt/jCNYPWV . There's currently a major effort to get this change reversed. One of the people working on it is YC's Luther Lowe ( https://ift.tt/u5HiRXV ). Luther has been organizing YC alumni to urge lawmakers to support this reversal. I asked him if we could do that on Hacker News too. He said yes—hence this thread :) If you're a US taxpayer and if you agree that software dev expenses should be deductible like they used to be, please sign this letter to the relevant committee members: https://ift.tt/q7ctpDF... . (If you're not a US person, please don't sign the letter, since lawmakers will only listen to feedback from taxpayers and we don't want to dilute the signal.) I'm sure not everyone here agrees with us—HN is a big community, there's no total agreement on anything—but this issue has as close to a community consensus as HN gets, so I think it makes sense to add our voices too. Luther will be around to answer questions and hopefully HN can contribute to getting this done!
821 by dang | 336 comments on Hacker News.
Companies building software in the US were hit hard a few years ago when the tax code stopped allowing deduction of software dev expenses. Now they have to be amortized over several years. HN has had many discussions about this, including The time bomb in the tax code that's fueling mass tech layoffs - https://ift.tt/hoqVlnK - (927 comments) a few days ago. Other threads are listed at https://ift.tt/jCNYPWV . There's currently a major effort to get this change reversed. One of the people working on it is YC's Luther Lowe ( https://ift.tt/u5HiRXV ). Luther has been organizing YC alumni to urge lawmakers to support this reversal. I asked him if we could do that on Hacker News too. He said yes—hence this thread :) If you're a US taxpayer and if you agree that software dev expenses should be deductible like they used to be, please sign this letter to the relevant committee members: https://ift.tt/q7ctpDF... . (If you're not a US person, please don't sign the letter, since lawmakers will only listen to feedback from taxpayers and we don't want to dilute the signal.) I'm sure not everyone here agrees with us—HN is a big community, there's no total agreement on anything—but this issue has as close to a community consensus as HN gets, so I think it makes sense to add our voices too. Luther will be around to answer questions and hopefully HN can contribute to getting this done!
Sunday, 8 June 2025
Saturday, 7 June 2025
Friday, 6 June 2025
Thursday, 5 June 2025
Wednesday, 4 June 2025
Tuesday, 3 June 2025
Monday, 2 June 2025
Sunday, 1 June 2025
Saturday, 31 May 2025
Wednesday, 28 May 2025
Tuesday, 27 May 2025
Monday, 26 May 2025
Sunday, 25 May 2025
Saturday, 24 May 2025
Friday, 23 May 2025
Thursday, 22 May 2025
Wednesday, 21 May 2025
Monday, 19 May 2025
Sunday, 18 May 2025
Saturday, 17 May 2025
Friday, 16 May 2025
Thursday, 15 May 2025
Wednesday, 14 May 2025
New best story on Hacker News: AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
620 by Fysi | 167 comments on Hacker News.
See also https://ift.tt/1rgipXH ( https://ift.tt/tjURKm8 )
620 by Fysi | 167 comments on Hacker News.
See also https://ift.tt/1rgipXH ( https://ift.tt/tjURKm8 )
Tuesday, 13 May 2025
Sunday, 11 May 2025
Saturday, 10 May 2025
Friday, 9 May 2025
Thursday, 8 May 2025
Wednesday, 7 May 2025
Tuesday, 6 May 2025
Monday, 5 May 2025
Friday, 2 May 2025
Thursday, 1 May 2025
Wednesday, 30 April 2025
Tuesday, 29 April 2025
New best story on Hacker News: Show HN: I built a hardware processor that runs Python
Show HN: I built a hardware processor that runs Python
920 by hwpythonner | 241 comments on Hacker News.
Hi everyone, I built PyXL — a hardware processor that executes a custom assembly generated from Python programs, without using a traditional interpreter or virtual machine. It compiles Python -> CPython Bytecode -> Instruction set designed for direct hardware execution. I’m sharing an early benchmark: a GPIO test where PyXL achieves a 480ns round-trip toggle — compared to 14-25 micro seconds on a MicroPython Pyboard - even though PyXL runs at a lower clock (100MHz vs. 168MHz). The design is stack-based, fully pipelined, and preserves Python's dynamic typing without static type restrictions. I independently developed the full stack — toolchain (compiler, linker, codegen), and hardware — to validate the core idea. Full technical details will be presented at PyCon 2025. Demo and explanation here: https://ift.tt/MDeUBzP Happy to answer any questions
920 by hwpythonner | 241 comments on Hacker News.
Hi everyone, I built PyXL — a hardware processor that executes a custom assembly generated from Python programs, without using a traditional interpreter or virtual machine. It compiles Python -> CPython Bytecode -> Instruction set designed for direct hardware execution. I’m sharing an early benchmark: a GPIO test where PyXL achieves a 480ns round-trip toggle — compared to 14-25 micro seconds on a MicroPython Pyboard - even though PyXL runs at a lower clock (100MHz vs. 168MHz). The design is stack-based, fully pipelined, and preserves Python's dynamic typing without static type restrictions. I independently developed the full stack — toolchain (compiler, linker, codegen), and hardware — to validate the core idea. Full technical details will be presented at PyCon 2025. Demo and explanation here: https://ift.tt/MDeUBzP Happy to answer any questions
Monday, 28 April 2025
New best story on Hacker News: Widespread power outage in Spain and Portugal
Widespread power outage in Spain and Portugal
979 by lleims | 778 comments on Hacker News.
All of Spain is without energy. All systems have shut down immediately and are not coming back. Apparently the same has happened in Portugal.
979 by lleims | 778 comments on Hacker News.
All of Spain is without energy. All systems have shut down immediately and are not coming back. Apparently the same has happened in Portugal.
Sunday, 27 April 2025
Saturday, 26 April 2025
Friday, 25 April 2025
Thursday, 24 April 2025
Wednesday, 23 April 2025
Monday, 21 April 2025
Friday, 18 April 2025
Wednesday, 16 April 2025
Tuesday, 15 April 2025
New best story on Hacker News: Cursor IDE support hallucinates lockout policy, causes user cancellations
Cursor IDE support hallucinates lockout policy, causes user cancellations
807 by scaredpelican | 280 comments on Hacker News.
Earlier today Cursor, the magical AI-powered IDE started kicking users off when they logged in from multiple machines. Like,you’d be working on your desktop, switch to your laptop, and all of a sudden you're forcibly logged out. No warning, no notification, just gone. Naturally, people thought this was a new policy. So they asked support. And here’s where it gets batshit: Cursor has a support email, so users emailed them to find out. The support peson told everyone this was “expected behavior” under their new login policy. One problem. There was no support team, it was an AI designed to 'mimic human responses' That answer, totally made up by the bot, spread like wildfire. Users assumed it was real (because why wouldn’t they? It's their own support system lol), and within hours the community was in revolt. Dozens of users publicly canceled their subscriptions, myself included. Multi-device workflows are table stakes for devs, and if you're going to pull something that disruptive, you'd at least expect a changelog entry or smth. Nope. And just as people started comparing notes and figuring out that the story didn’t quite add up… the main Reddit thread got locked. Then deleted. Like, no public resolution, no real response, just silence. To be clear: this wasn’t an actual policy change, just a backend session bug, and a hallucinated excuse from a support bot that somehow did more damage than the bug itself. But at that point, it didn’t matter. People were already gone. Honestly one of the most surreal product screwups I’ve seen in a while. Not because they made a mistake, but because the AI support system invented a lie, and nobody caught it until the userbase imploded.
807 by scaredpelican | 280 comments on Hacker News.
Earlier today Cursor, the magical AI-powered IDE started kicking users off when they logged in from multiple machines. Like,you’d be working on your desktop, switch to your laptop, and all of a sudden you're forcibly logged out. No warning, no notification, just gone. Naturally, people thought this was a new policy. So they asked support. And here’s where it gets batshit: Cursor has a support email, so users emailed them to find out. The support peson told everyone this was “expected behavior” under their new login policy. One problem. There was no support team, it was an AI designed to 'mimic human responses' That answer, totally made up by the bot, spread like wildfire. Users assumed it was real (because why wouldn’t they? It's their own support system lol), and within hours the community was in revolt. Dozens of users publicly canceled their subscriptions, myself included. Multi-device workflows are table stakes for devs, and if you're going to pull something that disruptive, you'd at least expect a changelog entry or smth. Nope. And just as people started comparing notes and figuring out that the story didn’t quite add up… the main Reddit thread got locked. Then deleted. Like, no public resolution, no real response, just silence. To be clear: this wasn’t an actual policy change, just a backend session bug, and a hallucinated excuse from a support bot that somehow did more damage than the bug itself. But at that point, it didn’t matter. People were already gone. Honestly one of the most surreal product screwups I’ve seen in a while. Not because they made a mistake, but because the AI support system invented a lie, and nobody caught it until the userbase imploded.
Monday, 14 April 2025
Sunday, 13 April 2025
Saturday, 12 April 2025
Friday, 11 April 2025
Wednesday, 9 April 2025
Tuesday, 8 April 2025
Monday, 7 April 2025
Sunday, 6 April 2025
Saturday, 5 April 2025
Friday, 4 April 2025
Thursday, 3 April 2025
Wednesday, 2 April 2025
New best story on Hacker News: Tell HN: Announcing tomhow as a public moderator
Tell HN: Announcing tomhow as a public moderator
1076 by dang | 333 comments on Hacker News.
Hi all, Tom Howard is going public as HN moderator today. He has been doing HN moderation work for years already and knows the site and its practices inside-out, so the only new thing you'll see is mod comments from Tom showing up in the threads the way mine do. I'm not going anywhere, so you'll have two of us to put up with going forward :) I've known Tom since he was sctb's and my batchmate back in YC W09. Many of you know him as the kind and thoughtful community member tomhoward ( https://ift.tt/wkWMg7U ). He's still kind and thoughtful, but he's going to post as tomhow from now on ( https://ift.tt/qgOIx8N ), the same way I switched to dang when I went through this rite of passage years ago. Below is a bit from Tom about himself. Please join me in welcoming him to this new status which he was crazy enough to say yes to! --- YC and HN have been a huge part of my life for nearly two decades. I read pg's essay How to Start a Startup in 2005 after my friend (and later, co-founder) Fenn found it on Slashdot, and it opened our eyes as to how to go about building products and companies. I first signed up in late 2007, and since then HN has been the place I come to find interesting news and discussions. Hacker News gave me a window into the big wide world of technology and startups, that had previously seemed so remote and opaque from where I lived (and still live) in Australia. We were lucky enough to be accepted into the W09 batch of YC, and since then HN has been a place where we could share announcements about the startup, but also where I could share the challenges and struggles I experienced in the startup journey and other aspects of life, particularly to do with health and wellbeing. From the discussions that have happened about these topics I've ended up making enduring friendships with people all over the world, and have been able to learn many things that have improved my life in profound ways. I love HN's ethos - of being a place people come to engage their curiosity. That's what it's always been for me and what I hope I can help it to be for everyone! --Tom
1076 by dang | 333 comments on Hacker News.
Hi all, Tom Howard is going public as HN moderator today. He has been doing HN moderation work for years already and knows the site and its practices inside-out, so the only new thing you'll see is mod comments from Tom showing up in the threads the way mine do. I'm not going anywhere, so you'll have two of us to put up with going forward :) I've known Tom since he was sctb's and my batchmate back in YC W09. Many of you know him as the kind and thoughtful community member tomhoward ( https://ift.tt/wkWMg7U ). He's still kind and thoughtful, but he's going to post as tomhow from now on ( https://ift.tt/qgOIx8N ), the same way I switched to dang when I went through this rite of passage years ago. Below is a bit from Tom about himself. Please join me in welcoming him to this new status which he was crazy enough to say yes to! --- YC and HN have been a huge part of my life for nearly two decades. I read pg's essay How to Start a Startup in 2005 after my friend (and later, co-founder) Fenn found it on Slashdot, and it opened our eyes as to how to go about building products and companies. I first signed up in late 2007, and since then HN has been the place I come to find interesting news and discussions. Hacker News gave me a window into the big wide world of technology and startups, that had previously seemed so remote and opaque from where I lived (and still live) in Australia. We were lucky enough to be accepted into the W09 batch of YC, and since then HN has been a place where we could share announcements about the startup, but also where I could share the challenges and struggles I experienced in the startup journey and other aspects of life, particularly to do with health and wellbeing. From the discussions that have happened about these topics I've ended up making enduring friendships with people all over the world, and have been able to learn many things that have improved my life in profound ways. I love HN's ethos - of being a place people come to engage their curiosity. That's what it's always been for me and what I hope I can help it to be for everyone! --Tom
Monday, 31 March 2025
Sunday, 30 March 2025
Saturday, 29 March 2025
Friday, 28 March 2025
Wednesday, 26 March 2025
Tuesday, 25 March 2025
Monday, 24 March 2025
Sunday, 23 March 2025
Saturday, 22 March 2025
Tuesday, 18 March 2025
Monday, 17 March 2025
Sunday, 16 March 2025
Thursday, 13 March 2025
Wednesday, 12 March 2025
New best story on Hacker News: Show HN: Factorio Learning Environment – Agents Build Factories
Show HN: Factorio Learning Environment – Agents Build Factories
707 by noddybear | 204 comments on Hacker News.
I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE). FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints. A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity. The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning. Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms. Agents interact with FLE through a REPL pattern: 1. They observe the world (seeing the output of their last action) 2. Generate Python code to perform their next action 3. Receive detailed feedback (including exceptions and stdout) We provide two main evaluation settings: - Lab-play: 24 structured tasks with fixed resources - Open-play: An unbounded task of building the largest possible factory on a procedurally generated map We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin). The code is available at https://ift.tt/hEwWq3k . You'll need: - Factorio (version 1.1.110) - Docker - Python 3.10+ The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents. We would love to hear your thoughts and see what others can do with this framework!
707 by noddybear | 204 comments on Hacker News.
I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE). FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints. A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity. The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning. Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms. Agents interact with FLE through a REPL pattern: 1. They observe the world (seeing the output of their last action) 2. Generate Python code to perform their next action 3. Receive detailed feedback (including exceptions and stdout) We provide two main evaluation settings: - Lab-play: 24 structured tasks with fixed resources - Open-play: An unbounded task of building the largest possible factory on a procedurally generated map We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin). The code is available at https://ift.tt/hEwWq3k . You'll need: - Factorio (version 1.1.110) - Docker - Python 3.10+ The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents. We would love to hear your thoughts and see what others can do with this framework!
Tuesday, 11 March 2025
Monday, 10 March 2025
Sunday, 9 March 2025
New best story on Hacker News: Show HN: Bayleaf – Building a low-profile wireless split keyboard
Show HN: Bayleaf – Building a low-profile wireless split keyboard
726 by sgraz | 245 comments on Hacker News.
Hey HN, I built a wireless, split, ultra-low profile keyboard from scratch called Bayleaf. As a beginner I learned all things electronics, PCB-building, designing for manufacturing, and many other hardware-related skills to put this together. This case study dives into the build process and of course the final result, hope you enjoy!
726 by sgraz | 245 comments on Hacker News.
Hey HN, I built a wireless, split, ultra-low profile keyboard from scratch called Bayleaf. As a beginner I learned all things electronics, PCB-building, designing for manufacturing, and many other hardware-related skills to put this together. This case study dives into the build process and of course the final result, hope you enjoy!
Thursday, 6 March 2025
Wednesday, 5 March 2025
Monday, 3 March 2025
Friday, 28 February 2025
Thursday, 27 February 2025
New best story on Hacker News: Show HN: I got laid off from Meta and created a minor hit on Steam
Show HN: I got laid off from Meta and created a minor hit on Steam
1106 by newobj | 255 comments on Hacker News.
I was at FB/Meta from late 2013 to early 2023, mostly working in the compiler/runtime spaces. I got hit in the spring 2023 layoff wave. I immediately started making games in my newfound free time (a lifelong interest, and I even worked in AA(A?) back ca. ~2000), and in October 2023 I stumbled upon the idea of a roguelike pachinko/plinko game inspired by Luck Be A Landlord. Things snowballed quickly, I started talking to publishers, then worked like crazy through all of 2024, almost the hardest I've ever worked in my career, and launched the game in December 2024. It's sold ~200,000 units in its first 10 weeks on Steam. So it's no Balatro, but I'd still say it did very well :) AMA? (my game is Ballionaire: https://ift.tt/p43CDaO... )
1106 by newobj | 255 comments on Hacker News.
I was at FB/Meta from late 2013 to early 2023, mostly working in the compiler/runtime spaces. I got hit in the spring 2023 layoff wave. I immediately started making games in my newfound free time (a lifelong interest, and I even worked in AA(A?) back ca. ~2000), and in October 2023 I stumbled upon the idea of a roguelike pachinko/plinko game inspired by Luck Be A Landlord. Things snowballed quickly, I started talking to publishers, then worked like crazy through all of 2024, almost the hardest I've ever worked in my career, and launched the game in December 2024. It's sold ~200,000 units in its first 10 weeks on Steam. So it's no Balatro, but I'd still say it did very well :) AMA? (my game is Ballionaire: https://ift.tt/p43CDaO... )
Wednesday, 26 February 2025
Monday, 24 February 2025
New best story on Hacker News: Show HN: I built an app to stop me doomscrolling by touching grass
Show HN: I built an app to stop me doomscrolling by touching grass
887 by risquer | 226 comments on Hacker News.
i wanted to change the habit of reaching for my phone in the morning and doomscrolling away an hour so i built an app to help me. now i have to literally touch grass before accessing my most distracting apps the app is built in swiftui, uses the screen time apis provided by apple and google vision to recognise grass or not i'd love to get your thoughts on the concept.
887 by risquer | 226 comments on Hacker News.
i wanted to change the habit of reaching for my phone in the morning and doomscrolling away an hour so i built an app to help me. now i have to literally touch grass before accessing my most distracting apps the app is built in swiftui, uses the screen time apis provided by apple and google vision to recognise grass or not i'd love to get your thoughts on the concept.
Sunday, 23 February 2025
Friday, 21 February 2025
Thursday, 20 February 2025
Wednesday, 19 February 2025
New best story on Hacker News: Show HN: Live-updating version of the 'What a week, huh?' meme
Show HN: Live-updating version of the 'What a week, huh?' meme
774 by dlazaro | 151 comments on Hacker News.
As a fun evening project, I made a live-updating version of the 'What a week, huh?' meme (based on a panel from The Adventures of Tintin comics [1]). There's a page for every timeframe: - 'What a day': https://ift.tt/BD0nJdf - 'What a week': https://ift.tt/sFDnA50 - 'What a month': https://ift.tt/ZHjJhfm - 'What a year': https://ift.tt/9gH43bp Current time is determined by a Cloudflare Worker using the request IP (not logged or stored). No JavaScript is sent to the browser. [1] https://ift.tt/vRjguns
774 by dlazaro | 151 comments on Hacker News.
As a fun evening project, I made a live-updating version of the 'What a week, huh?' meme (based on a panel from The Adventures of Tintin comics [1]). There's a page for every timeframe: - 'What a day': https://ift.tt/BD0nJdf - 'What a week': https://ift.tt/sFDnA50 - 'What a month': https://ift.tt/ZHjJhfm - 'What a year': https://ift.tt/9gH43bp Current time is determined by a Cloudflare Worker using the request IP (not logged or stored). No JavaScript is sent to the browser. [1] https://ift.tt/vRjguns
Tuesday, 18 February 2025
Sunday, 16 February 2025
Friday, 14 February 2025
Thursday, 13 February 2025
Wednesday, 12 February 2025
Tuesday, 11 February 2025
Sunday, 9 February 2025
Friday, 7 February 2025
Thursday, 6 February 2025
Wednesday, 5 February 2025
Monday, 3 February 2025
Sunday, 2 February 2025
Thursday, 30 January 2025
Wednesday, 29 January 2025
Tuesday, 28 January 2025
Monday, 27 January 2025
New best story on Hacker News: Google open-sources the Pebble OS
Google open-sources the Pebble OS
889 by hexxeh | 130 comments on Hacker News.
https://ift.tt/tQcx1WZ
889 by hexxeh | 130 comments on Hacker News.
https://ift.tt/tQcx1WZ
New best story on Hacker News: We're bringing Pebble back
We're bringing Pebble back
980 by erohead | 255 comments on Hacker News.
Thank you, Google. You didn't have to, but you did. We (the Pebble team and community) are extraordinarily grateful. I wrote a blog post about our plans to bring Pebble back, sustainably. https://ift.tt/cjANaOf We got our original start on HN ( https://ift.tt/nPz1ERq ), it's a pleasure to be back.
980 by erohead | 255 comments on Hacker News.
Thank you, Google. You didn't have to, but you did. We (the Pebble team and community) are extraordinarily grateful. I wrote a blog post about our plans to bring Pebble back, sustainably. https://ift.tt/cjANaOf We got our original start on HN ( https://ift.tt/nPz1ERq ), it's a pleasure to be back.
Sunday, 26 January 2025
Thursday, 23 January 2025
New best story on Hacker News: Thank HN: My bootstrapped startup got acquired today
Thank HN: My bootstrapped startup got acquired today
1052 by paraschopra | 156 comments on Hacker News.
Hello HN, I'm Paras Chopra, founder of VWO. We're an A/B testing platform that was born here as a Show HN in 2009: https://ift.tt/QBgJNwe Today, I sold the company to a private equity firm for $200mn. It's covered on TechCrunch: https://ift.tt/efPxch2... I was a 22 year old fresh graduate when I launched VWO on HN and got initial users. Feedback from people like @patio11 helped me get to PMF. And now, 15 years later, "site:ycombinator.com" is what I appended when I wanted to search for advice on what to keep in mind while selling my company. Thank you HN for sharing inspiration and wisdom all along. I honestly don't think I would have been an entrepreneur had it not been for hacker news. Every single day, HN is the first website I open! I'm feeling very grateful towards the community. Thanks @dang, and thank you Paul Graham for your essays and for creating this beautiful corner of the internet!
1052 by paraschopra | 156 comments on Hacker News.
Hello HN, I'm Paras Chopra, founder of VWO. We're an A/B testing platform that was born here as a Show HN in 2009: https://ift.tt/QBgJNwe Today, I sold the company to a private equity firm for $200mn. It's covered on TechCrunch: https://ift.tt/efPxch2... I was a 22 year old fresh graduate when I launched VWO on HN and got initial users. Feedback from people like @patio11 helped me get to PMF. And now, 15 years later, "site:ycombinator.com" is what I appended when I wanted to search for advice on what to keep in mind while selling my company. Thank you HN for sharing inspiration and wisdom all along. I honestly don't think I would have been an entrepreneur had it not been for hacker news. Every single day, HN is the first website I open! I'm feeling very grateful towards the community. Thanks @dang, and thank you Paul Graham for your essays and for creating this beautiful corner of the internet!
New best story on Hacker News: Show HN: I made an open-source laptop from scratch
Show HN: I made an open-source laptop from scratch
1057 by Hello9999901 | 143 comments on Hacker News.
Hello! I'm Byran. I spent the past ~6 months engineering a laptop from scratch. It's fully open-source on GH at: https://ift.tt/A0Q5KtM
1057 by Hello9999901 | 143 comments on Hacker News.
Hello! I'm Byran. I spent the past ~6 months engineering a laptop from scratch. It's fully open-source on GH at: https://ift.tt/A0Q5KtM
Wednesday, 22 January 2025
Tuesday, 21 January 2025
Monday, 20 January 2025
Sunday, 19 January 2025
Friday, 17 January 2025
Thursday, 16 January 2025
Wednesday, 15 January 2025
Tuesday, 14 January 2025
Monday, 13 January 2025
Friday, 10 January 2025
New best story on Hacker News: Show HN: Tetris in a PDF
Show HN: Tetris in a PDF
962 by ThomasRinsma | 173 comments on Hacker News.
I realized that the PDF engines of modern desktop browsers (PDFium and PDF.js) support JavaScript with enough I/O primitives to make a basic game like Tetris. It was a bit tricky to find a union of features that work in both engines, but in the end it turns out that showing/hiding annotation "fields" works well to make monochrome pixels, and keyboard input can be achieved by typing in a text input box. All in all it's quite janky but a nice reminder of how general purpose PDF scripting can be. The linked PDF is all ASCII so you can just open it in a text editor, or have a look at the source code here: https://ift.tt/iMZtQ39
962 by ThomasRinsma | 173 comments on Hacker News.
I realized that the PDF engines of modern desktop browsers (PDFium and PDF.js) support JavaScript with enough I/O primitives to make a basic game like Tetris. It was a bit tricky to find a union of features that work in both engines, but in the end it turns out that showing/hiding annotation "fields" works well to make monochrome pixels, and keyboard input can be achieved by typing in a text input box. All in all it's quite janky but a nice reminder of how general purpose PDF scripting can be. The linked PDF is all ASCII so you can just open it in a text editor, or have a look at the source code here: https://ift.tt/iMZtQ39
Thursday, 9 January 2025
Wednesday, 8 January 2025
Tuesday, 7 January 2025
Monday, 6 January 2025
Sunday, 5 January 2025
Friday, 3 January 2025
Wednesday, 1 January 2025
New best story on Hacker News: Happy New Year 2025
Happy New Year 2025
716 by martynvandijke | 173 comments on Hacker News.
Hey HN, this site always drags me back to visit it everyday. So for that Happy New year !
716 by martynvandijke | 173 comments on Hacker News.
Hey HN, this site always drags me back to visit it everyday. So for that Happy New year !
Subscribe to:
Posts (Atom)
-
Learn Postgres at the Playground – Postgres compiled to WASM running in browser 543 by samwillis | 144 comments on Hacker News.
-
NSA, NIST, and post-quantum crypto: my second lawsuit against the US government 486 by trulyrandom | 143 comments on Hacker News.
-
U.S. Postal Service starts nationwide electric vehicle fleet, buying 9,250 EVs 444 by lxm | 336 comments on Hacker News.