I have inherited a legacy application which makes use of a self-referencing table to facilitate a hierarchical structure. This results in recursive method calls which are creating a "bad smell".
The parent_id column references the primary key of the same table, and there are roughly 25 million records here:
+-------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+----------------+
| phase_id | int(10) | NO | PRI | NULL | auto_increment |
| plat_id | int(10) | YES | MUL | NULL | |
| name | text | YES | | NULL | |
| parent_id | int(10) | YES | MUL | NULL | |
| plan_id | int(10) | YES | MUL | NULL | |
+-------------+---------+------+-----+---------+----------------+
mysql> show table status like 'ref'\G
*************************** 1. row ***************************
Name: phase
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 25223658
Avg_row_length: 20
Data_length: 509450960
Max_data_length: 281474976710655
Index_length: 1026267136
Data_free: 0
Auto_increment: 25238013
I have a few questions about this kind of structure:
- Is it generally bad practice to implement a self-referencing table? The main negative I can think of is that it is difficult/impossible to get the maximum depth of the hierarchy in a single query as there may be an X number children.
- Is it worth redesigning this? Having so much data makes it more difficult to move it around.
- What are my options? I have heard a little bit about table partitioning, but don't know if it is suitable in my scenario.
Any pointers would be really appreciated ns