Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
Faculty of Science and Technology, Lancaster University, Lancaster, UK.
Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, UK.
Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, UK.
Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham, UK.
Visa övriga samt affilieringar
2025 (Engelska)Ingår i: Scientific Data, E-ISSN 2052-4463, Vol. 12, nr 1, artikel-id 170Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2025. Vol. 12, nr 1, artikel-id 170
Nationell ämneskategori
Datavetenskap (datalogi) Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:hj:diva-67230DOI: 10.1038/s41597-024-04320-xISI: 001410897400006PubMedID: 39880815Scopus ID: 2-s2.0-85217356919Lokalt ID: GOA;intsam;998117OAI: oai:DiVA.org:hj-67230DiVA, id: diva2:1934498
Forskningsfinansiär
EU, Horisont Europa, 101056973, 10040207, 10038999, 10049977Tillgänglig från: 2025-02-04 Skapad: 2025-02-04 Senast uppdaterad: 2025-02-17Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Hedblom, Maria M.Tan, He

Sök vidare i DiVA

Av författaren/redaktören
Hedblom, Maria M.Tan, He
Av organisationen
Jönköping AI Lab (JAIL)
I samma tidskrift
Scientific Data
Datavetenskap (datalogi)Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 43 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf