Thesis:https://arxiv.org/abs/2402.14207
Can we teach LLMs to write long articles from scratch, based on reliable sources?
Do Wikipedia editors think this will help them?
📣 Announcing STORM, a system for writing Wikipedia-style articles based on Internet searches. I am now using STORM in my daily research!
Generating long articles with citations is hard to do & hard to assess!
We break this down into two steps:
1️⃣ Pre-writing in which the system collects references and generates an outline.
2️⃣ Writing, in which the system generates the final article with citations.
"Pre-writing" involves researching a topic from scratch.
This is difficult even for human experts. Directly prompting language models to generate questions does not work! These problems lack depth and have limited breadth.
STORM aims to teach language modeling **asks good questions**.
STORM improves questioning by automatically discovering perspectives on research topics and adding perspectives to prompts. It also simulates information-seeking conversations to encourage often more in-depth follow-up questions.
We constructed FreshWiki to minimize data leakage into LM training data for evaluation.
To measure quality, we introduced soft recall of headings and entity recall of headings. Outline evaluation makes it easier to pre-write prototyping methods.
STORM is superior to a well-designed RAG baseline!
In the final writing stage, STORM generates text with citations and writes the complete article section by section.
leave it (to sb) STORM The generated articles are favored by automated metrics *and* experienced Wikipedia editors!
This illustrative writing should always be rooted in facts.
We assessed citation quality and asked Wikipedia editors to rate verifiability. We found that the main challenge comes from shifting the conversation away from widely discussed factual illusions.
This requires research beyond fact-checking!
We also asked Wikipedia editors about the perceived usefulness of STORM. Excitingly, all participants agreed that STORM was helpful in their pre-writing phase. Additionally, I myself use STORM to delve deeper into the concepts in my research (if you haven't seen our demo video, click here).
It's worth noting that STORM is a well-designed knowledge management pipeline, not a single tip or model.
We use DSPy to build STORM, which offers very neat modularity - this allows us to continue to extend our work without getting lost in a lot of hint files.