My Sites


Sunday, June 11, 2017

Text Pre-processing with Python Natural Language Toolkit (NLTK)

Text Preprocessing steps
  1. Tokenization
  2. Stemming and Lemmatization
  3. Stop Word Removal
  4. POS-tagging or Part-of-Speech tagging (https://nlp.stanford.edu/software/tagger.shtml)
Play Session
python
>>> import nltk
>>> nltk.download('all')

Reference: http://www.nltk.org/

#!/usr/bin/python
# -*- coding: utf-8 -*-
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
import json





3 comments: