http://www.panhidhalyrics.com/ Change Log

This article highlights the milestones in approach in developing www.panhidhalyrics.com. Anyone researching in the same area can use the following as a guideline to improve their approach.



 2020: Dec

Deprication of the probabilistic basic model of song generations developed using Django 

2021 : Jan 

RNN based raw impleneration for predict sinhala words using larger sample of data (not pre-prossed)

  • fushenkao - https://github.com/fushenkao/Sinhala-Lyrics-Gen 
  • minimaxir - https://github.com/minimaxir/textgenrnn 
  • පසුතලය : කැටපත් පවුර, සීගිරිය (Sigiri Graffiti) http://thenationaltrust.lk/wp-content/uploads/2018/06/nds-nt-sigiriya.pdf
  • when the dataset becomes bigger training conducted using the GoogleColab to generate the model (refer following notebook) : 
  • https://colab.research.google.com/drive/1YlooyHnyhK8BcmrHF87resDT8O1FAppf?usp=sharing

2021 : Feb

Approach to adopt LSTM assuming the relationship between stanzas in a sequence.

The basis of the following Notebook adopted as the inception.

https://www.kaggle.com/annalee7/poetry-text-generation-lstm (https://colab.research.google.com/drive/107gSW7zSe35qRDiSK-xX16p3WoJUmr7U?usp=sharing)

The limitation in identifying the combined characters in Sinhala demanded the additional coding with a Regx. and described in the following article.

https://ksankalpa.blogspot.com/2021/03/combined-letters-in-unicode-how-to-used.html

2021: March

The Notebook on character-wise prediction for text file size more than 3 MB  was tested in Kaggle, google collab as well as Amazon (sageMaker). The processing speed in not satisfactory.

As an alternative, the Textgen packed reverse-engineered to some extent. Developed a Google Collab notebook that imports the Textgen package in python.

https://colab.research.google.com/drive/1S_hUiuDmjKAFbdpMLxNsmezcRut0ARac#scrollTo=PldIuEiY1cPD

The effort was given up with the complexity and time to understand the code. 

The TPU implementation of LSTM in google collab customised to Sinhala language using the following notepad.

Predict Shakespeare with Cloud TPUs and Keras (https://colab.research.google.com/drive/1oY07-aI8ASC504gYEbK0pwE3qrcA_zCj#scrollTo=tU7M-EGGxR3E)

The original implementation used character-wise ASCII coding which limits the training for Unicodes which decoded values distributed in a range beyond 0 to 256. The dictionary was used to coding the characters in the corpus and trained the LSTM. The results produced a much faster way of using TPU.

2021 April

The existing Flask API server was integrated with the model beild using TPU. A new API request was implemented. Customization of Textgerenn API  request for a model trained by TPU was not successful.

The Textgerenn development mainly for TensorFlow 1.9 or blow and the TPU based TensorFlow support the TensorFlow versions above 2.

May 2021

The deployment of TensorFlow 2 to Heroku gives an error on limitation about the deployment size. The TensorFlow 2 package exceed 500 MB.

Hense docker container the approach used for deployment. 

The docker contain for flask app requires special configurations: https://stackoverflow.com/questions/30323224/deploying-a-minimal-flask-app-in-docker-server-connection-issues

Heroku docker delopyment guide 

https://medium.com/analytics-vidhya/dockerize-your-python-flask-application-and-deploy-it-onto-heroku-650b7a605cc9

docker image build -t my-app .

docker run -p 5000:5000 -d my-app

Missign the container login in the deployment guide for Flask : https://stackoverflow.com/questions/61959667/deploy-docker-images-to-heroku

heroku container:push web --app <name-for-your-app>

heroku container:release web --app <>

May 20

The docker container deployed to Heroku and version 2 realses 

May 25 

In order to improve the word order, actions take to understand the TPU code on prediction, The TPU use TensorFlow TPU functions which separate the dataset to several parts and train the model speratley. Use the gcp tutorials to understand them .

https://www.gcptutorials.com/article/how-to-use-batch-method-in-tensorflow

Understanding TensorFlow map function 

https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map

The TensorFlow map function is used to combine the conservative samples. The combined consecutive samples are divided by the batch size. Better results were obtained by the sample size of 50 and batch size of 2048. The generation set used 25 and 5 respectively.

The resulted model was deployed to the system and updated version 2.


June 20

Start searching about the possibility to add the google input tools autocomplete function for improving the results by the user itself.  The google transliterate function is deprecated now. 
https://developers.google.com/transliterate/v1/getting_started

Posponeded it temporary.

July 01

Searched about the possibility to deliver the TPU training as a service in the free stack,, as i found all ML as a service as paid solutions. The opportunity is there within the google collapse itself by converting to a service,, however, the invitation required human intervention.

Augest 01 

Add docker composer to make the build changes automate for the backend python services.

https://dev.to/alissonzampietro/the-amazing-journey-of-docker-compose-17lj

https://stackoverflow.com/questions/44342741/auto-reloading-flask-server-on-docker

Moving the front end of panhida to react , Change UI , Facing issues with CORS policy , missing the reference of 1st model  and TPU deployment.

Comments

Popular posts from this blog

ENOENT: no such file or directory, rename : node_modules/async

react-quill Integrate quill-image-resize-module