This time, I’m going to talk about how to approach general NLP problems. But we’re not going to look at the standard tips which are tosed around on the internet, for example on platforms like kaggle.
Read more
In a recent article, we saw how to implement a basic validation pipeline for text data. Once a machine learning model has been deployed its behavior must be monitored. The predictive performance is expected to degrade over time as the environment changes.
Read more
An important part of machine learning applications, is making sure that there is no data degeneration while a model is in production. Sometimes downstream data processing changes and machine learning models are very prone to silent failure due to this.
Read more
In every machine learning project, the training data is the most valuable part of your system. In many real-world machine learning projects the largest gains in performance come from improving training data quality. Training data is often hard to aquire and since the data can be large, quality can be hard to check.
Read more
If you are dealing with a large collections of documents, you will often find yourself in the situation where you are looking for some structure and understanding what is contained in the documents. Here I’ll show you a convenient method for discovering and understanding clusters of text documents.
Read more
Since a lot of people recently asked me how neural networks learn the embeddings for categorical variables, for example words, I’m going to write about it today. You all might have heard about methods like word2vec for creating dense vector representation of words in an unsupervised way.
Read more
This time we have a look into the magnitude library, a feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity.
Read more
Word vectors Today, I tell you what word vectors are, how you create them in python and finally how you can use them with neural networks in keras. For a long time, NLP methods use a vectorspace model to represent words.
Read more
In this post, I will show you how to use the isolation forest algorithm to detect attacks to computer networks in python.
The term isolation means separating an instance from the rest of the instances. Since anomalies are ‘few and different’ and therefore they are more susceptible to isolation.
Read more
This time we’re going to discuss a current machine learning completion on kaggle. In this competition, you’re challenged to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate.
Read more