Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Utvärdering av olika språkmodeller för identifiering av information i svenska texter
Jönköping University, Tekniska Högskolan, JTH, Avdelningen för datateknik och informatik.
Jönköping University, Tekniska Högskolan, JTH, Avdelningen för datateknik och informatik.
2021 (Svenska)Självständigt arbete på grundnivå (kandidatexamen), 180 hpStudentuppsats (Examensarbete)Alternativ titel
Evaluation of different language models for identifying entities in Swedish texts (Engelska)
Abstract [en]

The purpose of the study was to investigate how different types of Natural Language Processing (NLP) models can be fine-tuned for use within Named Entity Recognition (NER) and how these models work differently. The aim of this was to identify which type of model works best when it comes to entity extraction. To test the models, they were fine-tuned using PyTorch and Huggingface Transformers libraries. Here, it was also investigated which other techniques are not based on machine learning that can be used to solve the problem of entity extraction. The results of these tests showed that the KB/BERT-base-swedish-cased model worked best. Also, there are other techniques that can be used to solve the problem of entity extraction but that it is machine learning that is the most effective. KB/BERT-base-swedish-cased was shown to be the best model but it also proved that it may be due to the purpose that you need to use the model within. The study limitations have been that there was no large data set that could be used for training and that there was a lack of time to produce a new dataset. 

Ort, förlag, år, upplaga, sidor
2021. , s. 45
Nyckelord [sv]
Maskininlärning, Named Entity Recognition, Natural Language Processing, Transformers, BERT, ALBERT, ELECTRA, HFST-SweNER
Nationell ämneskategori
Datorteknik Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
URN: urn:nbn:se:hj:diva-55101ISRN: JU-JTH-DTA-1-20210159OAI: oai:DiVA.org:hj-55101DiVA, id: diva2:1612215
Externt samarbete
Cybercom Group AB
Ämne / kurs
JTH, Datateknik
Handledare
Examinatorer
Tillgänglig från: 2021-11-18 Skapad: 2021-11-17 Senast uppdaterad: 2021-11-18Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Av organisationen
JTH, Avdelningen för datateknik och informatik
DatorteknikSpråkteknologi (språkvetenskaplig databehandling)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 480 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf