Nigerian Languages Python libraries

Hello Radar

I’m learning Natural Language Processing in Python with NLTK. I’m trying to process Nigerian Language text. For now, I’m using Google translation api.

Does anyone know of any module that has library of Nigerian Languages, or an NLTK lexicon of Nigerian Languages? I want to know if there is one before attempting to compile.

1 Like

Cool to see someone showing interest in this. I really doubt if there is anything reasonable out there for Nigerian languages. What exactly are you looking for ? Dictionary? Tag Library? Translations? I once read a research that was done by some folks in UI, they used MOSES for translation, but the results didn’t look too good. The Ife people also did something around rule based translation, but doesn’t look too good either.

If this is something you’re very interested in, we can meet up to discuss ideas around this.

PS: My MSc degree was in NLP.

1 Like

Wow, that’s good to know. I won’t mind a meetup.

I’m working with NLTK module, and I’ll like to know if anyone has trained the module with Nigerian Languages. But I’ll like to see other projects that have worked with Nigerian Languages, whether speech to text or language processing.

magic happening :smile: @logbon72

Resources and projects on language processing for Nigerian languages exist.
I have been working on text-to-speech for Nigerian languages.
If you are interested, kindly let me know.

1 Like

I’m definitely interested! How can we meet? Do you have any live project or a repo? Can you share links?

NLP is a really broad and growing topic so it’ll be useful if you specified the particular area that interests you.

I doubt if you’ll find any publicly available lexicons on models trained on any Nigerian language. Even Google Translate still struggles to get things right (e.g. it fails miserably with Hausa translations.)

It’s a challenging path, but you can pick a Nigerian language, and then get a couple of language experts to guide you on the syntactic and semantic features of the language. In Hausa, for example, the phrase “bana so” and “ba na so” is generally accepted to be the same thing (i.e. “I don’t want”). Knowing such peculiarities will greatly help whatever heuristics you put into building your language preprocessors and whatever comes afterward.

You’ll definitely find inspiration if you read some key academic papers that focus on NLP problems in other languages besides English.

1 Like

I used NLTK to develop a grammar for the Yorùbá numeral system and it did a clean job. You can download the project from https://onka-yoruba.googlecode.com/files/OnkaYoruba.exe. An online version is also available: http://num2yor.appspot.com/.

Let me know if you need more information about this project.

Cheers.

1 Like

Do you have a tag model for Nigerian languages? I’ve been looking for that for a while.

Let’s do NLP meet up and discuss challenges in solving Nigerian language issues. I’m really down for this.

1 Like

Second on the meetup.

A meet up will be cool.

This might be a long shot, but you can talk to Kola Tubosun who’s doing a lot of work with bringing the Yoruba language online. He might be able to point you in the right direction.

He was featured in this TechCabal post: This Linguist Is Creating An Online Dictionary Of Yoruba Names

1 Like

I’ll do that, thanks!

I’ll talk to CCHUB about a meetup.