http://www.panhidhalyrics.com/ Change Log
This article highlights the milestones in approach in developing www.panhidhalyrics.com. Anyone researching in the same area can use the following as a guideline to improve their approach.
2020: Dec
Deprication of the probabilistic basic model of song generations developed using Django
2021 : Jan
RNN based raw impleneration for predict sinhala words using larger sample of data (not pre-prossed)
- fushenkao - https://github.com/fushenkao/Sinhala-Lyrics-Gen
- minimaxir - https://github.com/minimaxir/textgenrnn
- පසුතලය : කැටපත් පවුර, සීගිරිය (Sigiri Graffiti) http://thenationaltrust.lk/wp-content/uploads/2018/06/nds-nt-sigiriya.pdf
- when the dataset becomes bigger training conducted using the GoogleColab to generate the model (refer following notebook) :
- https://colab.research.google.com/drive/1YlooyHnyhK8BcmrHF87resDT8O1FAppf?usp=sharing
2021 : Feb
Approach to adopt LSTM assuming the relationship between stanzas in a sequence.
The basis of the following Notebook adopted as the inception.
https://www.kaggle.com/annalee7/poetry-text-generation-lstm (https://colab.research.google.com/drive/107gSW7zSe35qRDiSK-xX16p3WoJUmr7U?usp=sharing)
The limitation in identifying the combined characters in Sinhala demanded the additional coding with a Regx. and described in the following article.
https://ksankalpa.blogspot.com/2021/03/combined-letters-in-unicode-how-to-used.html
2021: March
The Notebook on character-wise prediction for text file size more than 3 MB was tested in Kaggle, google collab as well as Amazon (sageMaker). The processing speed in not satisfactory.
As an alternative, the Textgen packed reverse-engineered to some extent. Developed a Google Collab notebook that imports the Textgen package in python.
https://colab.research.google.com/drive/1S_hUiuDmjKAFbdpMLxNsmezcRut0ARac#scrollTo=PldIuEiY1cPD
The effort was given up with the complexity and time to understand the code.
The TPU implementation of LSTM in google collab customised to Sinhala language using the following notepad.
Predict Shakespeare with Cloud TPUs and Keras (https://colab.research.google.com/drive/1oY07-aI8ASC504gYEbK0pwE3qrcA_zCj#scrollTo=tU7M-EGGxR3E)
The original implementation used character-wise ASCII coding which limits the training for Unicodes which decoded values distributed in a range beyond 0 to 256. The dictionary was used to coding the characters in the corpus and trained the LSTM. The results produced a much faster way of using TPU.
2021 April
The existing Flask API server was integrated with the model beild using TPU. A new API request was implemented. Customization of Textgerenn API request for a model trained by TPU was not successful.
The Textgerenn development mainly for TensorFlow 1.9 or blow and the TPU based TensorFlow support the TensorFlow versions above 2.
May 2021
The deployment of TensorFlow 2 to Heroku gives an error on limitation about the deployment size. The TensorFlow 2 package exceed 500 MB.
Hense docker container the approach used for deployment.
The docker contain for flask app requires special configurations: https://stackoverflow.com/questions/30323224/deploying-a-minimal-flask-app-in-docker-server-connection-issues
Heroku docker delopyment guide
https://medium.com/analytics-vidhya/dockerize-your-python-flask-application-and-deploy-it-onto-heroku-650b7a605cc9
docker image build -t my-app .
docker run -p 5000:5000 -d my-app
Missign the container login in the deployment guide for Flask : https://stackoverflow.com/questions/61959667/deploy-docker-images-to-heroku
heroku container:push web --app <name-for-your-app>
heroku container:release web --app <>
May 20
The docker container deployed to Heroku and version 2 realses
May 25
In order to improve the word order, actions take to understand the TPU code on prediction, The TPU use TensorFlow TPU functions which separate the dataset to several parts and train the model speratley. Use the gcp tutorials to understand them .
https://www.gcptutorials.com/article/how-to-use-batch-method-in-tensorflow
Understanding TensorFlow map function
https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map
The TensorFlow map function is used to combine the conservative samples. The combined consecutive samples are divided by the batch size. Better results were obtained by the sample size of 50 and batch size of 2048. The generation set used 25 and 5 respectively.
The resulted model was deployed to the system and updated version 2.
Posponeded it temporary.
July 01
Searched about the possibility to deliver the TPU training as a service in the free stack,, as i found all ML as a service as paid solutions. The opportunity is there within the google collapse itself by converting to a service,, however, the invitation required human intervention.
Augest 01
Add docker composer to make the build changes automate for the backend python services.
https://dev.to/alissonzampietro/the-amazing-journey-of-docker-compose-17lj
https://stackoverflow.com/questions/44342741/auto-reloading-flask-server-on-docker
Moving the front end of panhida to react , Change UI , Facing issues with CORS policy , missing the reference of 1st model and TPU deployment.
Comments
Post a Comment