- using different categories,
- using more mixtures, and
- also seeing if I can use higher order n-grams.
I switched to testing out latent dirichlet allocation for now. To get that working, I need to
- normalize the text,
- convert the text vocabulary items to integers
- encode the text in each document (here, utterances) as the counts of each vocabulary item (represented as integers).
- figure out the lda software
Now that I 'm using my bike instead of my car, I'm going to have to adapt a little.
No comments:
Post a Comment