LessWrong Curated 今日热榜



Curated

1.	There is way too much serendipity	Malmesbury	
2.	Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training	evhub	
3.	How useful is mechanistic interpretability?	ryan_greenblatt	
4.	Gentleness and the artificial Other	Joe Carlsmith	
5.	Meaning & Agency	abramdemski	
6.	A case for AI alignment being difficult	jessicata	
7.	The Dark Arts	lsusr	
8.	Constellations are Younger than Continents	Jeffrey Heninger	
9.	Discussion: Challenges with Unsupervised LLM Knowledge Discovery	Seb Farquhar	
10.	Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible	GeneSmith	

更新于 2024-01-23

近期历史最近 100 条记录

2023-09-03	Dear Self; we need to talk about ambition	surprisetalk	
2023-08-30	Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research	evhub	
2023-08-26	Feedbackloop-first Rationality	Raemon	
2023-08-17	Self-driving car bets	paulfchristiano	
2023-08-13	My current LK99 questions	Eliezer Yudkowsky	
2023-08-08	When can we trust model evaluations?	evhub	
2023-08-05	Thoughts on sharing information about language model capabilities	paulfchristiano	
2023-08-02	Cultivating a state of mind where new ideas are born	Henrik Karlsson	
2023-07-28	Grant applications and grand narratives	Elizabeth	
2023-07-20	Accidentally Load Bearing	jefftk	
2023-07-12	Consciousness as a conflationary alliance term	Andrew_Critch	
2023-07-07	Lessons On How To Get Things Right On The First Try	johnswentworth	
2023-07-03	When do "brains beat brawn" in Chess? An experiment	titotal	
2023-06-30	Some background for reasoning about dual-use alignment research	Charlie Steiner	
2023-06-24	Updates and Reflections on Optimal Exercise after Nearly a Decade	romeostevensit	
2023-06-19	What will GPT-2030 look like?	jsteinhardt	
2023-06-14	The Base Rate Times, news through prediction markets	vandemonian	
2023-06-09	The ants and the grasshopper	Richard_Ngo	
2023-06-05	Trust develops gradually via making bids and setting boundaries	Richard_Ngo	
2023-06-01	Book Review: How Minds Change	bc4026bd4aaa5b7fe	
2023-05-29	How to have Polygenically Screened Children	GeneSmith	
2023-05-24	When is Goodhart catastrophic?	Drake Thomas	
2023-05-20	Steering GPT-2-XL by adding an activation vector	TurnTrout	
2023-05-17	Predictable updating about AI risk	Joe Carlsmith	
2023-05-12	How much do you believe your results?	Eric Neyman	
2023-05-06	Hell is Game Theory Folk Theorems	jessicata	
2023-04-30	Notes on Teaching in Prison	jsd	
2023-04-25	A stylized dialogue on John Wentworth's claims about markets and optimization	So8res	
2023-04-21	On AutoGPT	Zvi	
2023-04-18	Elements of Rationalist Discourse	Rob Bensinger	
2023-04-13	Discussion with Nate Soares on a key alignment difficulty	HoldenKarnofsky	
2023-04-08	What would a compute monitoring plan look like? [Linkpost]	Akash	
2023-04-04	"Carefully Bootstrapped Alignment" is organizationally hard	Raemon	
2023-03-28	On not getting contaminated by the wrong obesity ideas	Natália Coelho Mendonça	
2023-03-23	More information about the dangerous capability evaluations we did with GPT-4 and Claude.	Beth Barnes	
2023-03-19	The Social Recession: By the Numbers	antonomon	
2023-03-16	Enemies vs Malefactors	So8res	
2023-03-13	The Parable of the King and the Random Process	moridinamael	
2023-03-09	Acausal normalcy	Andrew_Critch	
2023-03-03	AI alignment researchers don't (seem to) stack	So8res	
2023-02-25	I hired 5 people to sit behind me and make me productive for a month	Simon Berens	
2023-02-22	Please don't throw your mind away	TsviBT	
2023-02-16	Cyborgism	NicholasKees	
2023-02-13	Childhoods of exceptional people	Henrik Karlsson	
2023-02-09	SolidGoldMagikarp (Plus, Prompt Generation)	Ivoah	
2023-02-06	Focus on the places where you feel shocked everyone's dropping the ball	So8res	
2023-02-03	Basics of Rationalist Discourse	Duncan_Sabien	
2023-01-30	My Model Of EA Burnout	LoganStrohl	
2023-01-27	Sapir-Whorf for Rationalists	Duncan_Sabien	
2023-01-24	How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme	Collin	
2023-01-21	Recursive Middle Manager Hell	crop_rotation	
2023-01-18	Models Don't "Get Reward"	Sam Ringer	
2023-01-15	We don’t trade with ants	KatjaGrace	
2023-01-12	Can we efficiently distinguish different mechanisms?	paulfchristiano	
2023-01-06	The Feeling of Idea Scarcity	johnswentworth	
2023-01-02	Staring into the abyss as a core life skill	benkuhn	
2022-12-29	Sazen	Duncan_Sabien	
2022-12-27	Finite Factored Sets in Pictures	Magdalena Wache	
2022-12-27	Be less scared of overconfidence	benkuhn	
2022-12-27	The Plan - 2022 Update	johnswentworth	
2022-12-27	A note about differential technological development	So8res	
2022-12-27	Mechanistic anomaly detection and ELK	paulfchristiano	
2022-12-27	Superintelligent AI is necessary for an amazing future, but far from sufficient	So8res	
2022-12-27	Mysteries of mode collapse – mysterious attractor states in LLMs	reallyeli	
2022-12-27	What it's like to dissect a cadaver	Alok Singh	
2022-12-27	Decision theory does not imply that we get to have nice things	So8res	
2022-12-27	Let’s think about slowing down AI	KatjaGrace	