Google BERT (Bidirectional Encoder Representations from Transformers) and other transformer-based models further improved the state of the art on eleven natural language processing tasks under broad categories of single text classification (e.g., sentiment analysis), text pair classification (e.g., natural language inference), question answering (like SQuAD 1.1) and text tagging (e.g., named entity recognition).
BERT model is based on a few key ideas:
Recently, EleutherAI released their GPT-3-like model GPT-Neo, and a few days ago, it was released as a part of the Hugging Face framework. At the time of writing, this model is available only at the master branch of the transformers repository, so you need to install it like this:
pip install git+https://github.com/huggingface/transformers@master
The main goal is to show you the simplest way to fine-tune the GPT-Neo model to generate new movie descriptions using this dataset of Netflix movies and tv shows.
Today, three of the most popular end-to-end ASR (Automatic Speech Recognition) models are Jasper, Wave2Letter+, and Deep Speech 2. Now they are available as a part of the OpenSeq2Seq toolkit made by Nvidia. All these ASR systems are based on neural acoustic models, which produces a probability distribution Pt(c) over the all target characters c per each time step t, which is in turn evaluated by CTC loss function:
An Optical Character Recognition (OCR) task is quite an old problem dated back to the 1970s when the first omni-font OCR technology has been developed. The complexity of this task comes from many natural features of texts:
As the demand for data-driven products grows, the data science community has been rapidly developing solutions that allow us to create and apply all the recent revolutionary advances in artificial intelligence across multiple platforms. In the early years of the so-called AI era, it was very common to have a deep learning model running on a script. But as the scope of our problems and their requirements evolved, these frameworks were ported onto various platforms such as IoT devices, mobile devices, and the browser.
Well, let’s assume that we have a dialog corpus that could be divided into pairs of questions and answers. And we need some kind of neural network that will predict the next word based on the previous ones.
We also know that the input of neural network must be a matrix with values from -1 to 1 or 0 to 1. But our text consists of English words, how can we encode them in appropriate way?
One way to do this is the good old one-hot encoding (OHE). Then each word would be represented by a very long — in…
Well, actually, there are plenty of useful resources like this or this, that explained in details how GRU/LSTM architectures work, all the math behind them and so on, but I think that all explanations I’ve seen before are somewhat misleading.
Many articles show pictures like these:
One of the easiest ways to generate images of decent quality is to use Deep Convolutional Generative Adversarial Network (DCGAN) architecture, invented by Ian Goodfellow in 2014. This architecture consists of two networks — generator and discriminator. The generator network is able to take random noise and map it into images such that the discriminator can not tell which images came from the dataset and which images came from the generator. The generator network attempts to fool the discriminator network. At that point, the discriminator network adapts to the new fake data. …
First of all, let’s talk about how modern processors work. According to Intel’s manual for software developers, modern processors have one interesing detail that directly affects visibility of data written to memory:
So, stores are executed in two phases and after Execution phase we can not guarantee that other CPU cores will see updated data. Second phase will be executed later and order of load/store operations could not be maintained.
But why store buffers were implemented in the first place? Well, modern CPUs have many cores, each core has its own (exclusive) L1 cache and access to multiple shared caches…
Note: I’m not involved in OpenJDK development in any way. All observations expressed below are my own.
In my first article here, I’d like to talk about some interesting solutions that have found their way into the most popular JVM implementation in the world — Hotspot, originally developed by Sun Microsystems and then refined by Oracle to the current state.
Typical JVM implementation consists of multiple subsystems, like various parsers, class loaders, interpreters, JIT compilers and so on, but possibly, the most complicated subsystems are related to garbage collection.
First, the only purpose of JVM runtime is to execute compatible…
Team Lead & Software Architect