LLM Agent and Multi-Agent Editorial Screening:A Pre-Publication Integrity and Refinement Layer for Scholarly Publishing

DOI: (to be assigned)

John Swygert

March 25, 2026

Abstract

Scholarly publishing increasingly faces integrity risks that are not always captured by traditional editorial checks. Many publication failures are not initially obvious as isolated anomalies; instead, they emerge as cross-manuscript, cross-author, or volume-level patterns involving conflicts of interest, abnormal authorship concentration, citation irregularities, structural duplication, implausible publication velocity, or editorial-governance weaknesses. Recent guidance from the International Committee of Medical Journal Editors (ICMJE) and the World Association of Medical Editors (WAME) emphasizes confidentiality, transparency, accountability, and the need for editorial policies and tools governing the use of artificial intelligence in publishing. ICMJE’s January 2026 update added expanded guidance on AI in publishing, and WAME has explicitly stated that editors need appropriate digital tools to deal with the effects of chatbots on publishing. What remains underdeveloped is a practical, scalable screening layer that helps editors identify higher-order risk patterns before publication. This paper proposes LLM Agent and Multi-Agent Editorial Screening as a pre-publication integrity and refinement framework. The model does not replace editors, reviewers, or peer review. Instead, it functions as a structured triage and detection system that flags manuscripts, author clusters, special issues, and proceedings volumes for closer human review. This paper also clarifies that the present proposal does not emerge in isolation: the author has already designed an agent-based analytical/editorial framework, conducted preliminary testing in general terms, and previously published broader conceptual papers describing structured corpora, analytical baselines, corpus-guided agents, and consistency-based scientific evaluation methods. The present paper formalizes and names that applied architecture more explicitly as an editorial-screening system for scholarly publishing.

1. Introduction

Retractions, expressions of concern, editorial controversies, and mass post-publication corrections create damage far beyond the individual paper. Authors may suffer reputational injury, editors may face scrutiny for insufficient oversight, publishers may absorb institutional embarrassment, and legitimate researchers may lose time disentangling unreliable work from the trustworthy literature. In some disciplines, flawed or insufficiently screened work may also misdirect scarce funding, experimental time, and public confidence. WAME states that its recommendations are intended to inform editors and help them develop policies for the use of chatbots in scholarly publishing, and it specifically notes the need for editors to have access to manuscript-screening tools.

The central argument of this paper is straightforward: scholarly publishing needs a dedicated upstream screening layer capable of detecting higher-order anomalies before publication. Not all serious editorial failures appear as blatant fabrication, obvious plagiarism, or a single easily identifiable methodological flaw. Some emerge only when manuscripts are examined in relation to each other, in relation to an editor’s role, in relation to a proceedings volume, or in relation to a citation and authorship network. Such patterns are often difficult for overburdened human editorial systems to perceive consistently at scale.

This paper proposes LLM Agent and Multi-Agent Editorial Screening as a practical response to that need. The proposal is not punitive in spirit. Its purpose is preventive, protective, and refinitional. It is meant to help identify where a paper, a paper cluster, or an editorial context deserves closer review before publication hardens into public consequence.

It is also important to establish continuity of authorship and design. The present framework grows directly out of earlier published conceptual work by the author. In those earlier pieces, the author described structured corpora as analytical baselines for computational knowledge systems, argued that computational agents could evaluate new material relative to stable reference frameworks rather than through unconstrained probabilistic generation alone, and outlined corpus-guided analytical agents capable of contradiction detection, conceptual drift detection, shard-level analysis, consistency checking, and structured evaluative reporting. Those earlier writings described the architecture in general terms. The present paper extends that prior work by making the editorial application explicit and by formally naming the system LLM Agent and Multi-Agent Editorial Screening.

2. Prior Conceptual Work and Preliminary Development

This paper does not claim that the present idea appeared suddenly or without precedent in the author’s own work. The framework presented here is the more explicit public articulation of a design line already established in earlier publications and preliminary development.

In the previously published conceptual paper on structured corpora as analytical baselines, the author argued that computational analysis becomes materially stronger when anchored to a curated body of definitions, methods, reasoning standards, exemplary documents, and diagnostic examples. The paper proposed that analytical agents operating within such structured corpora could compare new documents against baseline conceptual standards, thereby detecting contradictions, unsupported claims, and conceptual drift.

In the related follow-up paper on corpus-guided analytical agents, the author described a broader method for training analytical AI to behave less like freeform text generators and more like structured evaluators. That framework included decomposition of documents into analytical units, comparison against reference baselines, convergence and coherence checks, and standardized output categories. Those publications did not fully disclose the present editorial-screening architecture in specific operational terms, but they did establish its conceptual basis.

Accordingly, the present paper should be understood as a formalization and extension of prior design work already conceived, partially articulated, and preliminarily tested in more general terms. The contribution here is to define the editorial use case directly: journals, proceedings, special issues, and scholarly publication environments require a dedicated screening layer capable of identifying non-obvious structural, ethical, and governance risks before publication.

3. The Ethical Foundation Already Exists

The ethical basis for this proposal is not speculative. ICMJE’s current Recommendations include expanded AI guidance and explicitly address the responsibilities of editors, reviewers, and publishers in relation to the use of AI in publishing. ICMJE states that AI can generate authoritative-sounding output that may be incorrect, incomplete, or biased; that humans remain ultimately responsible for ensuring the accuracy of AI-assisted content; that submitted manuscripts are privileged communications; and that editors and reviewers should not upload manuscripts into AI systems where confidentiality cannot be assured without authors’ explicit permission. ICMJE also states that journals should have policies governing AI use and should make editors, reviewers, and authors aware of those policies.

The January 2026 ICMJE update further confirms that the organization expanded its guidance on AI in publishing by creating a new Section V devoted to the topic.

WAME similarly emphasizes the emerging risks associated with chatbots and generative AI in scholarly publishing. Its recommendations state that editors need appropriate digital tools to deal with the effects of chatbots on publishing and that editors and reviewers should specify any use of chatbots in evaluation of manuscripts and generation of reviews or correspondence.

These principles matter because they show that the publishing community does not lack ethical awareness. It already recognizes issues of confidentiality, responsibility, transparency, accountability, and tool use. What remains underdeveloped is a coherent operational layer that applies those principles consistently and at scale. That is the gap addressed by LLM Agent and Multi-Agent Editorial Screening.

4. Why Traditional Editorial Checks Miss System-Level Risk

Traditional editorial workflows are still oriented mainly around the individual manuscript. A paper is assessed for scope fit, basic readability, glaring plagiarism, formal compliance, and eventual peer-review suitability. These checks remain necessary, but they are no longer sufficient.

Many major editorial problems are not visible inside one paper alone. They emerge only when multiple papers are compared within a set. Examples include unusually concentrated authorship inside one proceedings volume, suspicious overlap in methods and conclusions across nominally different papers, abnormal citation clustering, repeated semantic templates, editor-author role conflicts, and publication patterns that may be individually plausible but collectively irregular.

A core weakness of current workflows is that they are still largely tuned to detect “classic” anomalies. Yet many modern failures are not classic in that sense. A paper may read smoothly, appear formatted properly, and avoid obvious textual manipulation while still participating in a broader pattern that should have triggered human concern. This is why a pre-publication screening layer must operate not only at the single-manuscript level, but at the cross-manuscript, author-network, issue-level, and proceedings-level scales.

LLM-based systems are especially well suited to this type of work because they can compare documents rapidly, summarize relational patterns, identify repeated structures, inspect language similarity, track definitional divergence, and generate interpretable analytical reports. Used properly, they do not replace editors. They extend editorial perception into areas where human attention is currently fragmented or under-resourced.

5. Defining LLM Agent and Multi-Agent Editorial Screening

LLM Agent Editorial Screening refers to a structured pre-publication process in which one language-model-based analytical agent examines a manuscript and its associated metadata for signs that merit closer editorial review. The agent does not decide publication. It produces a structured report for humans.

Such a report may identify internal inconsistency, unsupported or weakly supported assertions, disclosure irregularities, abnormal citation patterns, potential conceptual drift from domain baselines, conflicts between methods and conclusions, suspiciously repetitive language, or signs that the paper is atypical enough to deserve independent handling.

Multi-Agent Editorial Screening extends this approach by separating screening functions across specialized agents. One agent may evaluate authorship concentration and role conflicts. Another may examine reference lists and citation support. Another may compare manuscripts across a volume for structural overlap, recurrent conclusions, or semantic reuse. Another may inspect methodological coherence or statistical reporting irregularities. Another may assess issue-level or publisher-level governance signals.

This separation offers several advantages. It improves interpretability by making it easier to see why a paper or proceedings volume was flagged. It improves discipline by assigning focused tasks to each agent. It also reduces the temptation to treat one monolithic system as a black-box arbiter of scientific worth. The point is not artificial omniscience. The point is structured, accountable triage.

6. Core Functions of the Screening Framework

A serious editorial-screening system should operate across several categories of risk.

First, it should detect editorial-governance anomalies. These include editors publishing unusually heavily in their own issue or proceedings volume, weak recusal patterns, suspicious concentration of authorship among a narrow cluster, and other irregular overlaps between editorial power and publication opportunity.

Second, it should detect cross-manuscript structural anomalies. These include repeating conceptual templates, recurrent methods language with superficial variation, duplicated argument scaffolds, suspiciously parallel introductions or conclusions, and clustered documents that appear too similar to have been independently developed.

Third, it should detect citation anomalies. These include references that do not support the propositions for which they are invoked, unusually self-referential citation behavior, citation loops inside a narrow network, and bibliographic concentration that appears misaligned with the surrounding field.

Fourth, it should detect process anomalies. These include implausibly high throughput, unusual publication density within one issue, and timing or authorship patterns that are not proof of misconduct but are sufficiently irregular to justify secondary review.

Fifth, it should detect conceptual and logical anomalies. These include contradictions between definitions and results, drift away from foundational concepts in a corpus-governed domain, weak inferential transitions, and conclusions that outrun the evidence presented.

Sixth, it should produce a reputation-risk and integrity-risk summary. This is not an accusation of wrongdoing. It is a warning that publication without further human review may expose authors, editors, journals, or publishers to avoidable harm.

7. The Role of Structured Corpora and Analytical Baselines

The screening architecture proposed here depends on a principle already established in the author’s prior work: computational evaluation is strongest when anchored to a structured analytical baseline.

In the author’s earlier framework, a structured corpus served as a conceptual equilibrium state against which new material could be measured. In practical terms, this means that analytical agents do not merely generate impressions. They evaluate manuscripts relative to a curated body of definitions, methodological standards, editorial policies, exemplary documents, and diagnostic examples.

This is crucial for editorial screening because many failures are not simply linguistic. They are relational. A manuscript may appear acceptable in isolation while still diverging sharply from the definitions, methodological norms, or evidentiary standards of the domain. A corpus-guided approach allows agents to compare submissions against stable reference axes rather than against vague statistical expectation alone.

This is also why the present proposal should not be confused with casual use of a chatbot to summarize a paper. The design envisioned here is structured, corpus-aware, policy-aware, and context-aware. It is not freeform language generation. It is analytical screening conducted against an established baseline.

8. Human Authority Must Remain Central

Any serious deployment of LLM editorial screening must preserve human editorial judgment. ICMJE makes clear that humans are ultimately responsible for reviewing and ensuring the accuracy of AI-assisted content and that journals must govern AI use through explicit policy. WAME likewise emphasizes that editors need tools, not replacements, and that accountability for scholarly evaluation remains human.

Accordingly, the system proposed here should be governed by four principles.

First, it should flag, not decide. It may recommend further review, independent handling, or additional verification. It should not autonomously reject, retract, accuse, or blacklist.

Second, it should be transparent. Editors should be able to inspect the basis for a flag, including the category of concern, the evidence type, and the limitations of the inference.

Third, it should be reviewable and appealable. A flagged author or editor should remain within a human process, not trapped inside an opaque computational decision.

Fourth, it should be confidentiality-preserving. Because submitted manuscripts are privileged communications, any deployment of screening agents must occur inside secure, policy-compliant systems where confidentiality is assured. ICMJE explicitly warns against manuscript uploading into AI environments where confidentiality cannot be protected without permission.

9. Why This Proposal Is Protective Rather Than Punitive

The moral tone of this proposal matters. The present paper is not written to expose or exploit the mistakes of others, nor to convert editorial breakdowns into spectacle. Publication failures often cause intense turmoil for everyone involved. Authors, editors, coauthors, institutions, and journals can all be harmed in ways that ripple outward long after the public story moves on.

That is precisely why the screening layer proposed here should be understood as a humane and preventive tool.

If a problematic pattern is caught early, a proceedings volume may be restructured before release. A paper may be revised before attracting public criticism. An editor may be recused before conflict becomes scandal. A citation problem may be corrected before it enters the permanent literature. A paper cluster may be sent for independent review before reputations are damaged.

The guiding principle is simple: better upstream refinement reduces downstream harm.

This is why the present proposal should not be framed as automated policing. It is better understood as a pre-publication integrity and refinement layer designed to protect the scientific record while minimizing needless public injury.

10. Minimal Implementation Architecture

A minimal implementation model could proceed in five stages.

The first stage is submission ingestion. The system receives manuscript text, author metadata, affiliations, disclosures, references, editorial assignments, and issue-level or proceedings-level context.

The second stage is single-manuscript screening. A primary agent examines internal coherence, terminology consistency, disclosure completeness, citation alignment, and structural anomalies.

The third stage is cross-manuscript comparison. Additional agents compare manuscripts against one another within a special issue, proceedings volume, or related cluster to identify overlap, concentration, or repeated patterns not visible in isolation.

The fourth stage is editorial memo generation. The system produces a structured report summarizing concerns by category, indicating confidence and uncertainty, and recommending whether the matter should proceed normally, receive ordinary manual review, or be referred for independent handling.

The fifth stage is human adjudication. Editors decide whether the paper proceeds, is revised, is reassigned, or is paused for ethics review.

This architecture is intentionally modest. It does not assume fully autonomous science evaluation. It assumes a disciplined screening workflow whose value lies in helping human editors see patterns earlier and more clearly.

11. Preliminary Testing and Practical Direction

The author states here that preliminary design and testing have already been undertaken in general terms. Earlier conceptual work did not fully disclose the present architecture as an explicit named editorial-screening framework, but it did establish its baseline methods: structured corpora, analytical baselines, decomposition of documents into evaluable units, comparison against reference frameworks, consistency assessment, and standardized evaluative outputs.

The present paper therefore marks a transition from broader conceptual groundwork to a more operational public statement. It names the framework directly and locates its application squarely in scholarly publishing. That step is important for priority, but also for utility. A concept cannot assist journals, publishers, or proceedings environments until it is articulated in terms they can actually adopt.

Future work should include development of secure editorial implementations, benchmark datasets for proceedings and special issues, transparent flag-taxonomy design, and comparative studies measuring how often agent-based screening identifies genuine governance or structural concerns earlier than conventional workflows alone.

12. Limits and Risks

This proposal has real limitations. LLM systems can overgeneralize, hallucinate, misclassify benign irregularities, or confuse unusual work with problematic work. They can inherit biases from data and deployment design. They can also create a false sense of security if editors assume that a clean screening report means a paper is sound.

ICMJE’s guidance directly addresses the broader problem by emphasizing that AI output can be incorrect, incomplete, or biased and that humans remain responsible for its use. WAME likewise frames chatbot use as both promising and risky.

These warnings should not weaken the present proposal. They should shape it. The system proposed here is not an oracle and must never be treated as one. Its function is triage, not final judgment. Its success should be measured not by how many papers it blocks, but by whether it improves early detection of genuine concerns while minimizing false accusations and preserving due process.

13. Conclusion

Scholarly publishing already knows many of its ethical obligations. It knows that confidentiality matters, that editors and reviewers remain responsible for their use of AI, that journals should develop explicit AI policies, and that editors need better tools to manage the effects of generative systems on publishing.

What it lacks is a practical and principled integrity layer that operates before publication.

LLM Agent and Multi-Agent Editorial Screening is proposed here as that missing layer: not a substitute for editors, but an instrument for helping them see what unaided workflows often miss. The framework grows directly out of the author’s earlier work on structured corpora, analytical baselines, and corpus-guided agents, and it reflects preliminary design and testing already undertaken in general form. The contribution of this paper is to formalize that architecture specifically for scholarly publishing.

Used responsibly, such a system could help protect authors from preventable harm, help editors manage risk with greater consistency, help publishers avoid avoidable scandal, and help preserve trust in the scientific record through earlier refinement rather than later public collapse.

References

International Committee of Medical Journal Editors. Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals. Updated January 2026.

International Committee of Medical Journal Editors. Editors’ Role in Ensuring Responsible Use of AI. 2026 Recommendations.

World Association of Medical Editors. Chatbots, Generative AI, and Scholarly Manuscripts. May 31, 2023.

Swygert J. Structured Corpora as Analytical Baselines for Computational Knowledge Systems: A Conceptual Framework for Corpus-Guided Analytical Agents. Secretary Suite. March 4, 2026.

Swygert J. Corpus-Guided Analytical Agents: The Secretary Suite Method for Training Scientific Evaluation AI. Secretary Suite. March 5, 2026.

Swygert J. Secretary Suite Architectural Clarifications: Definitions and Common Misinterpretations. Ivory Tower Journal. March 4, 2026.