2

I'm fairly new to Python but I haven't found the answer to this particular problem. I am writing a simple recommendation program and I need to have a dictionary where cuisine is a key and name of a restaurant is a value. There are a few instances where I have to split a string of a few cuisine names and make sure all other restaurants (values) which have the same cuisine get assigned to the same cuisine (key). Here's a part of a file:

Georgie Porgie
87%
$$$
Canadian, Pub Food

Queen St. Cafe
82%
$
Malaysian, Thai

Mexican Grill
85%
$$
Mexican

Deep Fried Everything
52%
$
Pub Food

so it's just the first and the last one with the same cuisine but there are more later in the file. And here is my code:

def new(file):
    file = "/.../Restaurants.txt"
    d = {}
    key = []
    with open(file) as file:
        lines = file.readlines()

    for i in range(len(lines)):
        if i % 5 == 0:
            if "," not in lines[i + 3]:
                d[lines[i + 3].strip()] = [lines[i].strip()]
            else:
                key += (lines[i + 3].strip().split(', '))
                for j in key:
                    if j not in d:
                        d[j] = [lines[i].strip()]
                    else:
                        d[j].append(lines[i].strip())
    return d

It gets all the keys and values printed but it doesn't assign two values to the same key where it should. Also, with this last 'else' statement, the second restaurant is assigned to the wrong key as a second value. This should not happen. I would appreciate any comments or help.

VytPil
  • 126
  • 1
  • 8

3 Answers3

1

In the case when there is only one category you don't check if the key is in the dictionary. You should do this analogously as in the case of multiple categories and then it works fine.

I don't know why you have file as an argument when you have a file then overwritten.

Additionally you should make 'key' for each result, and not += (adding it to the existing 'key'

when you check if j is in dictionary, clean way is to check if j is in the keys (d.keys())

def new(file):
    file = "/.../Restaurants.txt"
    d = {}
    key = []
    with open(file) as file:
        lines = file.readlines()

    for i in range(len(lines)):
        if i % 5 == 0:
            if "," not in lines[i + 3]:
                if lines[i + 3] not in d.keys():
                    d[lines[i + 3].strip()] = [lines[i].strip()]
                else:
                    d[lines[i + 3]].append(lines[i].strip())

            else:
                key = (lines[i + 3].strip().split(', '))
                for j in key:
                    if j not in d.keys():
                        d[j] = [lines[i].strip()]
                    else:
                        d[j].append(lines[i].strip())
    return d
Martyna
  • 212
  • 1
  • 7
0

Normally, I find that if you use names for the dictionary keys, you may have an easier time handling them later.

In the example below, I return a series of dictionaries, one for each restaurant. I also wrap the functionality of processing the values in a method called add_value(), to keep the code more readable.

In my example, I'm using codecs to decode the value. Although not necessary, depending on the characters you are dealing with it may be useful. I'm also using itertools to read the file lines with an iterator. Again, not necessary depending on the case, but might be useful if you are dealing with really big files.

import copy, itertools, codecs

class RestaurantListParser(object):

    file_name = "restaurants.txt"

    base_item = {
        "_type": "undefined",
        "_fields": {
            "name": "undefined",
            "nationality": "undefined",
            "rating": "undefined",
            "pricing": "undefined",
        }
    }


    def add_value(self, formatted_item, field_name, field_value):

        if isinstance(field_value, basestring):
            # handle encoding, strip, process the values as you need.
            field_value = codecs.encode(field_value, 'utf-8').strip()
            formatted_item["_fields"][field_name] = field_value
        else:
            print 'Error parsing field "%s", with value: %s' % (field_name, field_value)


    def generator(self, file_name):

        with open(file_name) as file:

            while True:
                lines = tuple(itertools.islice(file, 5))
                if not lines: break


                # Initialize our dictionary for this item
                formatted_item = copy.deepcopy(self.base_item)

                if "," not in lines[3]:
                    formatted_item['_type'] = lines[3].strip()
                else:
                    formatted_item['_type'] = lines[3].split(',')[1].strip()
                    self.add_value(formatted_item, 'nationality', lines[3].split(',')[0])

                self.add_value(formatted_item, 'name', lines[0])
                self.add_value(formatted_item, 'rating', lines[1])
                self.add_value(formatted_item, 'pricing', lines[2])

                yield formatted_item

    def split_by_type(self):

        d = {}
        for restaurant in self.generator(self.file_name):
            if restaurant['_type'] not in d:
                d[restaurant['_type']] = [restaurant['_fields']]
            else:
                d[restaurant['_type']] += [restaurant['_fields']]

        return d

Then, if you run:

p = RestaurantListParser()
print p.split_by_type()

You should get:

{
    'Mexican': [{
        'name': 'Mexican Grill',
        'nationality': 'undefined',
        'pricing': '$$',
        'rating': '85%'
    }],
    'Pub Food': [{
        'name': 'Georgie Porgie',
        'nationality': 'Canadian',
        'pricing': '$$$',
        'rating': '87%'
    }, {
        'name': 'Deep Fried Everything',
        'nationality': 'undefined',
        'pricing': '$',
        'rating': '52%'
    }],
    'Thai': [{
        'name': 'Queen St. Cafe',
        'nationality': 'Malaysian',
        'pricing': '$',
        'rating': '82%'
    }]
}

Your solution is simple, so it's ok. I'd just like to mention a couple of ideas that come to mind when I think about this kind of problem.

Ivan Chaer
  • 6,980
  • 1
  • 38
  • 48
0

Here's another take, using defaultdict and split to simplify things.

from collections import defaultdict

record_keys = ['name', 'rating', 'price', 'cuisine']


def load(file):
    with open(file) as file:
        data = file.read()

    restaurants = []
    # chop up input on each blank line (2 newlines in a row) 
    for record in data.split("\n\n"):
        fields = record.split("\n")

        # build a dictionary by zipping together the fixed set
        # of field names and the values from this particular record
        restaurant = dict(zip(record_keys, fields))

        # split chops apart the type cuisine on comma, then _.strip()
        # removes any leading/trailing whitespace on each type of cuisine 
        restaurant['cuisine'] = [_.strip() for _ in restaurant['cuisine'].split(",")]
        restaurants.append(restaurant)

    return restaurants


def build_index(database, key, value):
    index = defaultdict(set)
    for record in database:
        for v in record.get(key, []):
            # defaultdict will create a set if one is not present or add to it if one does
            index[v].add(record[value])

    return index


restaurant_db = load('/var/tmp/r')
print(restaurant_db)

by_type = build_index(restaurant_db, 'cuisine', 'name')
print(by_type)
Trenton
  • 11,678
  • 10
  • 56
  • 60