Posts

Showing posts with the label Sinhala

Combined letters in Unicode - How to used with Python

Image
 I have to utlised the combined letters of Sinhala for my study. This is not a problem of English, however, a language having vowel sounds by combining parts with a letter, the analysis of letters within the word ideally need to carry out with combing parts.   Output : ['ඵ', ';', '\u200c', ':', 'ෙ', 'ු', 'ඕ', 'c', 'ප', 'ථ', 'ඥ', 'x', '්', '2', '☔', 'උ', 'n', 'ම', '3', '}', '?', 'ඃ', 'ත', 'හ', '0', 'ො', 'z', 'ෲ', ',', 'ඔ', 'ඳ', 'i', 'ං', '▪', 'ි', '[', 'ඤ', 'ස', 'b', 'ූ', '_', '☁', 'ඟ', 'ඩ', 'ෑ', 'ේ', '1', 'අ', 'ආ', 'ට', 'ී', 'q', '•', 'd', 'ද', '–

http://www.panhidhalyrics.com/ Change Log

Image
This article highlights the milestones in approach in developing www.panhidhalyrics.com. Anyone researching in the same area can use the following as a guideline to improve their approach.  2020: Dec Deprication of the probabilistic basic model of song generations developed using Django  2021 : Jan  RNN based raw impleneration for predict sinhala words using larger sample of data (not pre-prossed) fushenkao - https://github.com/fushenkao/Sinhala-Lyrics-Gen  minimaxir - https://github.com/minimaxir/textgenrnn  පසුතලය : කැටපත් පවුර, සීගිරිය (Sigiri Graffiti) http://thenationaltrust.lk/wp-content/uploads/2018/06/nds-nt-sigiriya.pdf when the dataset becomes bigger training conducted using the GoogleColab to generate the model (refer following notebook) :  https://colab.research.google.com/drive/1YlooyHnyhK8BcmrHF87resDT8O1FAppf?usp=sharing 2021 : Feb Approach to adopt LSTM assuming the relationship between stanzas in a sequence. The basis of the following Notebook adopted as the inception.