No Results Found
The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.
The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.
Reranking the k best hypothesis parse trees from an existing parser allows to take into account more information that the model has gathered during training than simply decoding the most likely dependency tree. In this paper, we first investigate whether state-of-the-art parsers can still benefit from reranking in low-resource languages. As part of this analysis, we deliver new insights concerning rerankability. Second, we propose a novel approach to mixture reranking by using a reject option for the reranker, which paves the way for designing interpretable reranking-based parsing systems in the future.
This paper presents a deep learning based model to detect the completeness and correctness of a sentence. It’s designed specifically for detecting errors in speech recognition systems and takes several typical recognition errors into account, including false sentence boundary, missing words, repeating words and false word recognition. The model can be applied to evaluate the quality of the recognized transcripts, and the optimal model reports over 90.5% accuracy on detecting whether the system completely and correctly recognizes a sentence.
This paper is discussing a review of different text classification models, both the traditional ones, as well as the state-of-the-art models. Simple models under review were the Logistic Regression, naïve Bayes, k-Nearest Neighbors, C-Support Vector Classifier, Linear Support Vector Machine Classifier, and Random Forest. On the other hand, the state-of-the-art models used were classifiers that include pretrained embeddings layers, namely BERT or GPT-2. Results are compared among all of these classification models on two multiclass datasets, ‘Text_types’ and ‘Digital’, addressed later on in the paper. While BERT was tested both as a multiclass as well as a binary model, GPT-2 was used as a binary model on all the classes of a certain dataset. In this paper we showcase the most interesting and relevant results. The results show that for the datasets on hand, BERT and GPT-2 models perform the best, though the BERT model outperforms GPT-2 by one percentage point in terms of accuracy. It should be born in mind that these two models were tested on a binary case though, whereas the other ones were tested on a multiclass case. The models that performed the best on a multiclass case are C-Support Vector Classifier and BERT. To establish the absolute best classifier in a multiclass case, further research is needed that would deploy GPT-2 on a multiclass case.
Recently, online customer reviews have surged in popularity, placing additional demands on businesses to respond to these reviews.
Conditional text generation models, trained to generate a response given an input review have been proposed to facilitate human authors in composing high quality responses.
However, this approach has been shown to yield rather unsatisfying, generic responses while, in practice, responses are required to address reviews specifically and individually.
We hypothesise that this issue could be tackled by changing the alignment paradigm and using sentence-aligned training data instead of document-aligned. Yet, finding correct sentence alignments in the review-response document pairs is not trivial.
In this paper we investigate methods to align sentences based on computing the surface and semantic similarity between source and target pairs and benchmark performance for this rather challenging alignment problem.
Voice assistants understanding dialects would help especially elderly people. Automatic Speech Recognition (ASR) performs poorly on dialects due to the lack of sizeable datasets. We propose three adaptation strategies which allow to improve an ASR model trained for German language to understand Swiss German spoken by a target speaker using as little as 1.5 hours of speaker data. Our best result was a word error rate (WER) of 0.27 for one individual.
We propose a novel type of document representation that preserves textual, visual, and spatial information without containing any sensitive data. We achieve this by transforming the original visual and textual data into simplified encodings. These pieces of non-sensitive information are combined into a tensor to form the NonDisclosureGrid (NDGrid). We demonstrate its capabilities on information extraction tasks and show, that our representation matches the performance of state-of-the-art representations and even outperforms them in specific cases.
In supervised classification tasks, a machine learning model is provided with an input, and after the training phase, it outputs one or more labels from a fixed set of classes. Recent developments of large pre-trained language models (LLMs), such as BERT, T5 and GPT-3, gave rise to a novel approach to such tasks, namely prompting.
In prompting, there is usually no further training required (although fine-tuning is still an option), and instead, the input to the model is extended with an additional text specific to the task – a prompt. Prompts can contain questions about the current sample, examples of input-output pairs or task descriptions. Using prompts as clues, a LLM can infer from its implicit knowledge the intended outputs in a zero-shot fashion.
Legal prompt engineering is the process of creating, evaluating, and recommending prompts for legal NLP tasks. It would enable legal experts to perform legal NLP tasks, such as annotation or search, by simply querying large LLMs in natural language.
In this presentation, we investigate prompt engineering for the task of legal judgement prediction (LJP). We use data from the Swiss Federal Supreme Court and the European Court of Human Rights, and we compare various prompts for LJP using multilingual LLMs (mGPT, GPT-J-6B, etc.) in a zero-shot manner. We find that our approaches achieve promising results, but the long documents in the legal domain are still a challenge compared to single sentence inputs.
For the compliance and legal profession, the exponential growth of data is both a threat and a promise. A threat, because finding crucial facts buried in hundreds of thousands of documents is hard. A promise, because proper management, analysis, and interpretation of data provides a competitive advantage and full transparency during all stages of investigations.
In this talk, we present our novel text analytics platform Herlock.ai to leverage these possibilities.
Herlock.ai finds mentions of persons, dates, and locations in the corpus and makes these findings available to the user. In order to achieve this, several hurdles have to be overcome. Paper documents need to become machine readable. Even when the digital version exactly replicates the paper, the data is not available for analysis because of human inconsistencies and errors.
Herlock.ai fixes these problems and provides excellent content. In order to support the user in their work, Herlock.ai needs to be easy to use and understand, e.g. by splitting documents into meaningful parts, by comparing different variants, and by marking textual anomalies.
We will do a demo of Herlock.ai. The platform has been used in a recent Swiss legal case that has received high media coverage. 500 federal folders that physically fill entire walls of shelves were an unprecedented challenge for the involved parties. The quick search and navigation is a key tool and the analytics we provided were used for official submissions to the court.
The project “Schweizer Dialektsammlung” (“Swiss Dialect Collection”) has been running since spring 2021. Its goal is to collect a large dataset with Swiss German audio samples and their transcriptions to Standard German text. So far, we have crowdsourced 200 hours of audio from nearly 4000 volunteers via a web recording platform, equivalent to over 150’000 text prompts. The dataset is called SDS-200 and will be released for research purposes.
In a related project funded by the Schweizer Nationalfonds (SNF), we are using SDS-200 together with parallel dialect data to find out how Swiss German Speech-to-text (STT) systems can better recognise dialects for which little annotated data is available. Initial experimental results show that including SDS-200 as part of the training data significantly enhances STT performance: the BLEU score on the All Swiss German Dialects Test Set improves from 48 to 65 when we add SDS-200.
We are also planning the next phase of “Schweizer Dialektsammlung”, where users can form teams and compete for prizes.
We will
– present the project and the data collected so far
– discuss our Speech-to-Text experiments and results
– talk about lessons learnt
– provide an outlook of planned future activities in data collection and systems development
Transcribing Swiss German speech automatically into Standard German text is a challenging Speech-to-Text (STT) translation task. For the last two years, we at FHNW worked on the development of an end-to-end system to solve this task. In cooperation with ZHAW we also created a 35 hours test corpus which contains 7 x 5 hours of audios with transcripts of Swiss newspaper articles spoken in 7 Swiss German dialects (Basel, Bern, Graubünden, Innerschweiz, Ostschweiz, Wallis, and Zürich). Thus, for each region, we collected a total of 3600 spoken sentences from at least 10 different speakers.
We use this test set to objectively quantify the quality of our STT system and compare it to two commercial STT services for Swiss German to Standard German.
We evaluated all three STT systems using our test set and we present a fair comparison using our carefully designed test corpus. We discuss weaknesses and strengths of the three models in terms of the different dialects and other aspects.
Agenda Introduction about – Who we are – Our Firm – The Vision Starts – Our purpose & Activities – Beyond Expectations – Spoiler Alert:/bs.com/career – Conversation Banking – the project
Daniel Mazzolini Head of UBS-BSC Manno
Vladislav Matic Lugo Product Owner Advanced Analytics
Drawing on our network of around 200 branches and, 4,600 client advisors, complemented by modern digital banking services and customer service centers, we are able to reach approximately 80% of Swiss wealth.
Conversational Banking
Vision Bring natural language as a new way for interacting with digital clients and employees, enhancing user experience and increasing efficiency
Ambition For our clients: Offer digital services via conversational interfaces
For our employees: Provide a virtual assistant for knowledge workers, call agents and client advisors along most important business domains
Why Conversational Baking – In 2020 more than +3m requests were raised to our support units – ~40% of queries are trivial in nature and have an associated self-service option or info materials – Common requests for queries are information, navigation, update, order. – Common questions are: What is it? Where can I find it? How to do it?
Leveraging cloud cognitive services for Conversational Banking use cases
Cloud native application in Switzerland – Leverage Microsoft Cognitive Services – Cloud
Large scale transformer models have become the de-facto standard in academia. However, until now few (non tech) companies have actually developed and globally scaled transformer-based data products, leading to a dearth of industry case studies. That said, at Zurich a data science team has developed a general purpose transformer based document extraction solution that was first piloted in 2018 – 2019 and later to over 10 markets globally, enabling the automated processing of millions of highly complex and diverse input documents (emails, pdfs, scans, voice to text messages etc.).
In this presentation the team will outline the opportunities and challenges of scaling such models in the financial services industry, outlining key technology and business considerations to successfully deploy and scale them in an industry setting. The importance of research collaborations with universities will also be covered in this talk.
Swiss software company WellD and its SaaS hotel tech spinoff
TellTheHotel wish to develop a closed-domain task-oriented
conversational agent for the hospitality industry.
As part of the Innosuisse-funded project TACO “Closed-domain
task-oriented conversational agents with embedded intelligence”, SUPSI
is helping WellD and TellTheHotel leverage the state of the art of NLP
to build a custom conversational agent. The main objective is the
development of a multi-language, multichannel, digital concierge that
enables end users to complete a hotel reservation as well as ancillary
activities through a natural conversational flow.
The conversational agent is based on customized cutting-edge NLP
techniques and the RASA framework. Basic hotel reservation requests
are handled with multi-language intent detection, which is carried out
through the application of BERT-based cross-lingual sentence
embeddings, with substantial benefits compared to translation-based
systems. User queries that go beyond room reservations are handled
with a BERT model fine-tuned for question answering.
From an architectural point of view, the project is developed based on
micro-services and deployed as a Kubernetes cluster to ensure
scalability.
Every day we write some text. We try to write grammatically correct and politely where we have spelling and grammar checkers to support us. But many people struggle to ensure that their writing does not contain deterring words. Research shows that using non-inclusive language (especially gendered, ageism, and racist words) has a particularly strong impact on missing a large share of potential talents in the labor market. We developed a smart tool Witty that can assist in automatically detecting and suggesting alternatives to deterring language, enabling inclusive writing.
The core Witty algorithm is based on Natural Language Processing (NLP) advanced technologies. We combine a rules-based approach and modern transformer architecture. We created our own glossaries (German, English) with inclusive and non-inclusive words with the help of our highly trained language specialist and based on studies and research in that field. We use NLP pre-trained models for German and English correspondingly (SpaCy). We transform the words in the text into dictionary-like forms, perform linguistic analysis, extract the linguistic features from the text, and also do named entity recognitions to extract geographic location, organization’s name, people, numbers from the user text if needed.
Currently, we are implementing the transformers (BERT, Hugging Face) to identify the right meaning of the words properly and classify the job-related text and perform the sentiment analysis of the text.
Libraries of technical expert information have been written in free text. Such texts are usually authored by experts in a semi-formal style. How can this valuable information be extracted into structured and useful representations?
The human touch of these texts renders rule based approaches useless. Annotating enough samples for an ML model might be too expensive. We show an approach essentially combining both worlds by using an off-the-shelf dependency parser together with tree-based rules of extraction.
Any syntax tree – even if it happens to be incorrect as in the below illustration – with its phrases and their relations do have the right format for rule-based extraction in this context (illustration only visible in PDF version). Rules detect syntactic relations and adpositions and build the structured output. Enumerations can easily be determined and extracted as lists.
We demonstrate the effectiveness of this methodology on a catalog from CRB, which has been standardizing processes in the Swiss construction industry for over 60 years. Expert authors write books of detailed definitions on walls, tunnels, canalisation etc. and specify which types of concrete can be used, for which purpose the wall is meant, what dimensions they should have. This is then used by contractors to formulate clear offers. For the next generation of standards a structured representation of the legacy catalogs is extracted by the above methodology.
We present a joint project between the Laboratory for Web Science at the FFHS (Fernfachhochschule Schweiz) and a start-up company, Skills Finder AG. It is funded by InnoSuisse. The goal of the project is to build a platform for processing job application documents that automates the extraction of relevant information from a candidate’s CV and the validation of this information with the candidate’s references and certificates. Our processing pipeline uses the most recent advances in the field of both image document processing and Natural Language Processing (NLP).
The first step in the processing of any document is to extract the text from it by taking a proper reading order into consideration. As CVs have very diverse layouts, none of the existing tools could extract correctly the text from them. To detect these complex layouts, we train a CV Layout Model by using a Deep Layout Parser (layout-parser.github.io), a unified toolkit for deep learning-based document image analysis.
The next step is an NLP component for the information extraction. We use Named Entity Recognition (NER) with a pretrained BERT multi-language model, where we fine tune the model on our dataset with a custom labeling.
The last step in our processing pipeline is the information validation. We calculate a semantic similarity between word phrases by using output feature vectors from the BERT model and the mean pooling. Our model achieves more than 80% accuracy on the skills extraction.
Classification of long documents is still a domain for classical machine learning techniques such as TF-IDF or BM25 with Support Vector Machines. Transformers and LSTMs do not scale well with the document length at training and inference time. For patents, this is a critical handicap since the key innovation is often described towards the end of the patent description, which varies in structure and length and can be relatively long.
Furthermore, because the class ontology for patents is very deep, specific classification can only be performed by looking at the differences that might be named in any part of the document. Therefore, it is advantageous to process the whole patent and not only specific parts.
We investigate hierarchical approaches that break down documents into smaller parts and other heuristics, such as summarization and hotspot detection, for Bert and PatentBERT and compare them to classical methods. The dataset was downloaded from the European patent office (EPO).
Automatic Speech Recognition (ASR) has numerous applications, including chatbots, transcription of meetings, subtiteling of TV shows, or automatic translation of conference presentations. For this reason, Speech-to-Text (STT) is a very active field of research, and tremendous progress has been made in the past years, in particular by using pretrained language models such as wave-to-vec and its derivates. On the other hand, several ready-to-use solutions exist, from international corporations such as Google or IBM to specialized providers such as Trint or Speechmatics to open-source frameworks such as Fairseq or DeepSpeech.
But how do you find the “best” ASR engine? Grounded decisions in this respect typically require an in-depth comparison of the performance of ASR engines on various annotated corpora. In order to simplify this process, we have developed a framework that allows to easily run and evaluate benchmarks on arbitrary ASR engines.
In this presentation, we introduce the framework itself as well as insights from our research on extensive benchmark experiments on various ASR engines. Among other things, we answer the following questions: How well do ASR engines perform on different types of speech, e.g. spontaneous vs. read-aloud? Can you combine several engines to achieve better results? How can you distinguish automatically between minor errors (e.g. singular vs. plural) and semantically significant errors (e.g. “cat” instead of “car”)
Large Language Models (LLMs) have led to large improvements in state-of-the-art results across language and image understanding and generation tasks. However, while they are largely capable to produce human-quality text in terms of grammaticality and, increasingly, coherence and relevance, it is often hard to distinguish whether the output of such models is grounded in actual knowledge or hallucinated. This talk will describe some recent work for knowledge infusion with the aim of improving the factuality of LLMs for question-answering tasks, as well as retrieval-based approaches leveraging LLMs for the same purpose.
Social media listening (SML) has the potential to help in many stages of the drug development process in the quest for patient-centric therapies that are fit-for-purpose and meaningful to patients. To fulfill this potential, however, it requires the leveraging of new quantitative approaches and analytical methods that draw from developments in NLP and real-world data (RWD) analysis applied to the real-world text (RWT) of social media. These approaches can be described under the umbrella term of quantitative SML (QSML) to distinguish them from the qualitative methods that have been commonly used. In this talk, I will describe what QSML is, why it is used and how it can support drug development, as well as ethical and legal considerations.
In my talk, I will discuss the issue of sparsity and separation of linguistic resources, showing how it can be overcome by following the practices developed by the Linguistic Linked Open Data (LLOD) community. After introducing the principles of the Linked Data paradigm, I will report a number of benefits of applying such principles to linguistic resources, also to fit the FAIR guiding principles for scientific data management. A use-case of resources currently published as LLOD will be then presented, namely the LiLa Knowledge Base, i.e. a collection of multifarious linguistic resources for Latin described and interlinked with the same vocabulary of knowledge description, by using common data categories and ontologies. The talk will detail the architecture of LiLa, whose core component consists of a large collection of Latin lemmas, serving as the backbone to achieve interoperability between the resources, by linking all those entries in lexical resources and tokens in corpora that point to the same lemma. In particular, the talk will focus on how lexical and textual resources are interlinked in the Knowledge Base. Three on-line services developed by LiLa will be presented as well, namely: a user-friendly interface to query the (meta)data interlinked in the Knowledge Base, the SPARQL endpoint of LiLa, and a tool to link automatically a raw Latin text to LiLa. Finally, the talk will discuss a number of challenges and open issues concerning interoperability between linguistic resources in infrastructures like CLARIN.