0

I have some connected components displayed. One of the components have five nodes in which the middle node is a common node. How can I obtain the index of each node in that component so that I can merge the other nodes along with a conjunction.

for component in nx.connected_components(graph):
    num_nodes=len(component)

    print num_nodes
    g=(
        filter(
            lambda x: x[0] in component and x[1] in component,
            graph.edges
        )
    ) 
    if num_nodes == 5:
        pl = []
        pl =  ''.join(item for tuple_ in g for item in tuple_)
        print 'Merged nodes'
        print pl
        sentences.append(pl)

Input sentence is: शर्मान एक विकेट घेतली. मयंकान तीन विकेट घेतली

Output is: घेतली तीन विकेट घेतली एक विकेट एक विकेट शर्मान तीन विकेट मयंकान

Expected Output:शर्मान एक विकेट आनी मयंकान तीन विकेट घेतली

आनी has to be added to combine the nodes. Output of connected components

vurmux
  • 9,420
  • 3
  • 25
  • 45
Sheryl
  • 79
  • 7

1 Answers1

0

You are using words as unique identifiers so you have no index data. Each your node is coded by the word. Moreover, you are firstly constructing the graph you are not using properly (after all your questions I can really say you don't need it) and losing language information, and then you are trying to re-create your data with some lost info. In your current question you already lost word position information so you can't do what you want (unless you are indexing all your nodes, as I wrote in my answer for your previous question).

I recommend you to use NLTK with Indian corpus (it is ALREADY prepared, filtered and tagged):

from nltk.corpus import indian
nltk.corpus.indian.words('hindi.pos')

It has Hindi, Devanagari and other languages. You can train Punkt tokenizer (it is auto-trainable) and have all your work be done. Moreover, there are NLTK modifications for Indian languages or special Hindi tokenizer exist. You don't need to do the whole work manually. You don't need networkx. Everything is already wrote by another programmers.

vurmux
  • 9,420
  • 3
  • 25
  • 45