Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
Faculty of Science and Technology, Lancaster University, Lancaster, UK.
Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, UK.
Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, UK.
Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, UK.
Show others and affiliations
2025 (English)In: Scientific Data, E-ISSN 2052-4463, Vol. 12, no 1, article id 170Article in journal (Refereed) Published
Abstract [en]

Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.

Place, publisher, year, edition, pages
Springer Nature, 2025. Vol. 12, no 1, article id 170
National Category
Computer Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:hj:diva-67230DOI: 10.1038/s41597-024-04320-xISI: 001410897400006PubMedID: 39880815Scopus ID: 2-s2.0-85217356919Local ID: GOA;intsam;998117OAI: oai:DiVA.org:hj-67230DiVA, id: diva2:1934498
Funder
EU, Horizon Europe, 101056973, 10040207, 10038999, 10049977Available from: 2025-02-04 Created: 2025-02-04 Last updated: 2025-02-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Hedblom, Maria M.Tan, He

Search in DiVA

By author/editor
Hedblom, Maria M.Tan, He
By organisation
Jönköping AI Lab (JAIL)
In the same journal
Scientific Data
Computer SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 36 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf