Skip to content
This repository was archived by the owner on Mar 29, 2021. It is now read-only.

Commit 9884a4e

Browse files
author
staticdev
committed
commit inicial
0 parents  commit 9884a4e

File tree

6 files changed

+214
-0
lines changed

6 files changed

+214
-0
lines changed

Dockerfile

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
FROM python:2.7.14-slim
2+
3+
# set the working directory to /app
4+
WORKDIR /app
5+
6+
# copy the current directory contents into the container at /app
7+
COPY . /app
8+
9+
# install git flex libcpanplus-perl make
10+
RUN apt-get update && \
11+
apt-get install -y git flex libcpanplus-perl make
12+
13+
# install perl libs
14+
RUN export PERL_MM_USE_DEFAULT=1 && perl -MCPAN -e 'install List::MoreUtils; install Text::LevenshteinXS; install Parallel::Loops'
15+
16+
# install requirements
17+
RUN pip install -r requirements.txt
18+
19+
# download UGCNormal
20+
RUN git clone https://github.com/carolcoimbra/UGCNormal.git ugc_norm
21+
22+
# configure UGCNormal
23+
RUN cd ugc_norm && sh configure.sh
24+
25+
EXPOSE 5000
26+
27+
# run app.py when the container launches
28+
CMD ["gunicorn", "app:APP", "-b", ":5000"]

README.rst

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
ugcnormal-microservice
2+
======================
3+
4+
Microsserviço REST para normalização pt_BR usando o `UGCNormal <https://github.com/avanco/UGCNormal>`_. Ideal para aplicações que precisam de normalização online como chatbots.
5+
6+
Webservice baseado no ugcnormal_interface `<https://github.com/thiagootuler/ugcnormal_interface>`_.
7+
8+
Requisitos
9+
----------
10+
11+
* Instalar Docker-CE 17.12.0+
12+
* 900 Mb de espaço em disco para imagem
13+
14+
Execução
15+
--------
16+
17+
Rodar os comandos:
18+
19+
.. code-block:: sh
20+
21+
# gerar a imagem
22+
sudo docker build -t ugcnormal .
23+
# verificar se gerou
24+
sudo docker images
25+
# instanciar imagem
26+
sudo docker run --name ugcnormal -d -p 5000:5000 --env "UGCNORMAL=./ugc_norm/speller" ugcnormal
27+
# conferir processo rodando
28+
sudo docker ps -a
29+
30+
# para parar o container olhe o nome dele no docker ps -a e execute
31+
sudo docker stop ugcnormal
32+
# para remover um container (precisa parar primeiro)
33+
sudo docker rm ugcnormal
34+
35+
Exemplos de uso
36+
---------------
37+
38+
Basta fazer um POST da mensagem a ser normalizada na url /reply passando a mensagem no campo "message" e o método no campo "method".
39+
40+
Métodos disponíveis:
41+
42+
* token: tokenizer
43+
* spell: speller
44+
* acronym: acronyms searcher
45+
* textese: untextese
46+
* proper_noun: proper noun normalizer
47+
48+
A mensagem normalizada é retornada no campo "reply". O status da requisição no campo "status", tendo com valor padrão para sucesso "ok".
49+
50+
Exemplo curl:
51+
52+
.. code-block:: sh
53+
54+
curl -X POST \
55+
http://localhost:5000/reply \
56+
-H 'content-type: application/json; charset=utf-8' \
57+
-d '{
58+
"message": "oi td bm?",
59+
"method": "spell"
60+
}'
61+
62+
Exemplo python3 nativo (http.client):
63+
64+
.. code-block:: python
65+
66+
import http.client
67+
68+
conn = http.client.HTTPConnection("localhost:5000")
69+
70+
payload = "{\"message\": \"oi td bm?\", \"method\": \"spell\"}"
71+
72+
headers = {
73+
'content-type': "application/json; charset=utf-8"
74+
}
75+
76+
conn.request("POST", "/reply", payload, headers)
77+
res = conn.getresponse()
78+
data = res.read()
79+
80+
print(data.decode("utf-8"))

app.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
#!/usr/bin/python
2+
# -*- coding: utf-8 -*-
3+
from flask import Flask, request, jsonify
4+
from normalizer import Normalizer
5+
6+
APP = Flask(__name__)
7+
APP.config['JSON_AS_ASCII'] = False # retrieve UTF-8 messages
8+
9+
NORM = Normalizer()
10+
11+
@APP.route('/reply', methods=['POST'])
12+
def reply():
13+
params = request.json
14+
if not params:
15+
return jsonify({
16+
"status": "error",
17+
"error": "Request must be of the application/json type!",
18+
})
19+
20+
message = params.get("message")
21+
method = params.get("method")
22+
23+
# Make sure the required params are present.
24+
if message is None or method is None:
25+
return jsonify({
26+
"status": "error",
27+
"error": "message and method are required keys"
28+
})
29+
30+
methods = {'token':NORM.tokenizer,
31+
'spell':NORM.speller,
32+
'acronym':NORM.acronym_searcher,
33+
'textese':NORM.untextese,
34+
'proper_noun':NORM.proper_noun_normalizer
35+
}
36+
37+
try:
38+
reply = methods[method](message)
39+
except KeyError:
40+
return jsonify({
41+
"status": "error",
42+
"error": "method not valid, try one of the following: token, spell, acronym, textese or proper_noun"
43+
})
44+
45+
# Send the response.
46+
return jsonify({
47+
"status": "ok",
48+
"reply": reply
49+
})

normalizer.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
from io import open
2+
from subprocess import PIPE, Popen
3+
4+
class Normalizer(object):
5+
def file_save(self, text):
6+
norm_file = open("./temp/file.txt", mode="w", encoding="utf-8")
7+
norm_file.write(text.decode('utf-8'))
8+
norm_file.close()
9+
10+
def tokenizer(self, text):
11+
#print ("Aplicando o tokenizador...")
12+
echo = Popen(['echo', text], stdout=PIPE)
13+
process = Popen(['./ugc_norm/tokenizer/webtok'], stdin=echo.stdout, stdout=PIPE)
14+
output = process.communicate()[0]
15+
return output
16+
17+
def speller(self, text):
18+
tokens = self.tokenizer(text)
19+
#print ("Aplicando o speller...")
20+
self.file_save(tokens)
21+
actual_direcory = Popen('pwd', shell=False, stdout=PIPE)
22+
previous_path = actual_direcory.communicate()[0]
23+
command = 'perl ./ugc_norm/speller/spell.pl -stat ./ugc_norm/speller/lexicos/regra+cb_freq.txt -f ' + previous_path[:-1] + '/temp/file.txt'
24+
process = Popen(command.split(), shell=False, stdout=PIPE)
25+
output = process.communicate()[0]
26+
return output
27+
28+
def acronym_searcher(self, text):
29+
checked_text = self.speller(text)
30+
#print ("Normalizando siglas...")
31+
self.file_save(checked_text)
32+
process = Popen('perl ./ugc_norm/siglas_map.pl ./ugc_norm/resources/lexico_siglas.txt ./temp/file.txt'.split(), shell=False, stdout=PIPE)
33+
output = process.communicate()[0]
34+
return output
35+
36+
def untextese(self, text):
37+
text_with_acronyms = self.acronym_searcher(text)
38+
#print ("Normalizando internetes...")
39+
self.file_save(text_with_acronyms)
40+
process = Popen('perl ./ugc_norm/internetes_map.pl ./ugc_norm/resources/lexico_internetes.txt ./ugc_norm/resources/lexico_internetes_sigl_abrv.txt ./temp/file.txt'.split(), shell=False, stdout=PIPE)
41+
output = process.communicate()[0]
42+
return output
43+
44+
def proper_noun_normalizer(self, text):
45+
without_textese = self.untextese(text)
46+
#print ("Normalizando nomes proprios...")
47+
self.file_save(without_textese)
48+
process = Popen('perl ./ugc_norm/np_map.pl ./ugc_norm/resources/lexico_nome_proprio.txt ./temp/file.txt'.split(), shell=False, stdout=PIPE)
49+
output = process.communicate()[0]
50+
return output

requirements.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
numpy
2+
scipy
3+
Flask==0.12.2
4+
gunicorn==19.7.1
5+
multiprocessing
6+
nltk
7+
sklearn

temp/file.txt

Whitespace-only changes.

0 commit comments

Comments
 (0)