0

I'm trying to design a database as a practice problem to learn ER models. To put it simply, I have three entities: User, Post and Comment. I have a one to many relation between User and Post (a user can publish posts), a one to many relation between User and Comment (a user can write many comments) and a one to many relation between Post and Comment (a post can contain many comments). Obviously, this forms a cycle, and I've been told to avoid cycles when designing. So is there any clever way to simplify these relations to eliminate cycles? Or maybe there's a universal algorithm, so to say, to eliminating cycles?

SalysBruoga
  • 875
  • 1
  • 9
  • 21

3 Answers3

5

A problematic cycle, something to be avoided, is present only when the same two entities are connected by more than one path. For instance, if an Employee works for a Department, a Department is located in a City, and you have also a relation that says in which City a certain Employee works. But if you have a relation saying in which City the Employee lives, that can be different from the City in which he works, then, even if it forms a cycle, you cannot eliminate it without losing information.

In a similar way, there nothing to eliminate in your model, unless you want to lose some essential information, since the relation between User and Comment expresses the fact that a certain user writes a certain Comment, that could be a Comment to a Post of another User.

Renzo
  • 26,848
  • 5
  • 49
  • 61
4

Obviously, this forms a cycle, and I've been told to avoid cycles when designing.

I don't know where the idea that ERD diagrams can't have "cycles" comes from. I spent years doing that kind of work -- we had an E-size plotter and kept a diagram taped to the door -- and never encountered the no-cycle rule until I heard about it on SO.

The real rule isn't about the picture on the page; it's about the rows in the database.

Imagine an org-chart table of employees. Most employees have a boss, and every boss is an employee. So there's a relationship from employee to employee: a loop. Except it's not a loop. It looks like one on the page because the employee table represents all employees. It doesn't look like one in the database, because each employee is distinct. If you mapped out all the boss-employee rows as index cards and yarn on the floor, you'd see a tree.

I guess the no-cycles rule arose to guard against redundant relationships, as mentioned in Why should I avoid loops when designing relationships for a database? But database design can't be reduced to simple rules like that. Normalization is the process of removing all redundancy, not just those with some graphical characteristic. If it could be checked with superficial rules, it could be done by a machine. All you'd do is pour column names in the top, and out would pop your database design.

So the actual rule is, don't draw derivable relationships. Draw the minimal set necessary to express the constraints, and no more. Like Michelangelo and the block of marble (supposedly): just chip away everything that's not David, and then you're done.

Community
  • 1
  • 1
James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
  • There IS a "no cycles" rule of sorts in data modeling land : an FK from A to B plus another FK from B to A. In SQL, it requires deferred constraint checking for the db to be updatable at all. Not the OP's scenario, but indeed a "cycles" scenario that designers must be aware of. – Erwin Smout Nov 14 '16 at 15:05
1

Entity cycles are decoupled by associative entities, which should have semantic and domain-driven names.

The most basic example is a graph.

The "wrong" (ie loopy) way to model a graph (with an ERD) is with one entity, Nodes, and a loop to represent a relation from one Node to another. The problem here is that we are thinking at the level of records instead of the level of the domain we are trying to model.

Instead of a loop, add another entity, Edges, which has a relationship with exactly two nodes.

The ERD that represents this model of a graph has 2 vertices (Nodes and Edges), and 2 edges (directed from Edges to Nodes) representing the arity/multiplicity of the relationship.

Note that Edges depend on Nodes, not the other way around.