The-OpenROAD-Project/RePlAce

segmentation violation

speedbooster opened this issue · 24 comments

Hi, I have modified the gcd_nontd_test.tcl test script to point to the libraries I am using and a DEF design output from QFlow. I am getting the following on rep init_replace line:

** WARNING: Your Detail Placement Step must be skipped.
              (i.e. this program will be executed as -onlyGP)
     If you want to have DP after GP, please specify -dpflag and -dploc.

[INFO] TargetDensity = 1.000000
[INFO] ExperimentIndex = 27
[INFO] DirectoryPath = /home/[user]/Desktop/usb/scratch/etc/live/experiment027
child killed: segmentation violation

Is it possible to use the flow at mature nodes? like 0.6u, 0.35u or 0.18u? I am attempting with X Fab.

mgwoo commented

To solve the error, could you give your input files and Tcl scripts for me?

Do you have access to XFab's kits on your end? They're confidential, and I am under NDA.

I am using XC06's digital cells library with 3-metal option (not thick).

mgwoo commented

Sorry, I dont' have.
Could you give me a log when you type "$ valgrind ./replace < your_script.tcl"?

Yes, sure, I was looking for a method to log the script. Thank you. I'll post back INSHAALLAH

Does RePlAce (and other tools) produce a log file when working? It'd help...

The script:

# 
# Examples for Non Timing-driven RePlAce with TCL usage
#

set design live
set lib_dir /home/[user]/pdk
set design_dir /home/[user]/Desktop/usb

set lef_path ${design_dir}/scratch/pdk.lef
set def_path ${design_dir}/layout/${design}.def


replace_external rep

# Import LEF/DEF files
rep import_lef $lef_path
rep import_def $def_path
rep set_output $design_dir/scratch

puts $def_path

rep set_verbose_level 0

# Initialize RePlAce
rep init_replace

# place_cell with BiCGSTAB 
#rep place_cell_init_place


# print out instances' x/y coordinates
#rep print_instances

# place_cell with Nesterov method
#rep place_cell_nesterov_place

# print out instances' x/y coordinates
#rep print_instances

# Export DEF file
#rep export_def ${design_dir}/scratch/{$design}_nontd.def
puts "Final HPWL: [rep get_hpwl]"

Previously I was typing replace in console, then replace < [script_name].tcl in the replace tcl console, which gave me previous output.
This time I typed valgrind replace < [script_name].tcl

==16894== Memcheck, a memory error detector
==16894== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16894== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==16894== Command: replace
==16894==
==16894== Invalid read of size 1
==16894== at 0x5D2B4A: GetTokenFromStack (lef_keywords.cpp:345)
==16894== by 0x5D2B4A: LefDefParser::GetToken(char**, int*) [clone .cold.152] (lef_keywords.cpp:405)
==16894== by 0x95F0A5: LefDefParser::lefyylex() (lef_keywords.cpp:657)
==16894== by 0x973D01: LefDefParser::lefyyparse() (lef.y:579)
==16894== by 0x6396E8: Replace::Circuit::ParseLef(std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >&, bool) (lefParser.cpp:3218)
==16894== by 0x62AA81: Init (lefdefIO.h:134)
==16894== by 0x62AA81: ParseLefDef() (lefdefIO.cpp:403)
==16894== by 0x62ACC6: ParseInput() (lefdefIO.cpp:241)
==16894== by 0x6C4604: replace_external::init_replace() (replace_external.cpp:325)
==16894== by 0x6CB065: _wrap_replace_external_init_replace (replace_wrap.cpp:3240)
==16894== by 0x6C8E1E: SWIG_Tcl_MethodCommand (replace_wrap.cpp:1329)
==16894== by 0x5C4AEB1: ??? (in /usr/lib64/libtcl8.5.so)
==16894== by 0x5C8F36B: ??? (in /usr/lib64/libtcl8.5.so)
==16894== by 0x5C97646: ??? (in /usr/lib64/libtcl8.5.so)
==16894== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16894==
==16894==
==16894== Process terminating with default action of signal 11 (SIGSEGV)
==16894== Access not within mapped region at address 0x0
==16894== at 0x5D2B4A: GetTokenFromStack (lef_keywords.cpp:345)
==16894== by 0x5D2B4A: LefDefParser::GetToken(char**, int*) [clone .cold.152] (lef_keywords.cpp:405)
==16894== by 0x95F0A5: LefDefParser::lefyylex() (lef_keywords.cpp:657)
==16894== by 0x973D01: LefDefParser::lefyyparse() (lef.y:579)
==16894== by 0x6396E8: Replace::Circuit::ParseLef(std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >&, bool) (lefParser.cpp:3218)
==16894== by 0x62AA81: Init (lefdefIO.h:134)
==16894== by 0x62AA81: ParseLefDef() (lefdefIO.cpp:403)
==16894== by 0x62ACC6: ParseInput() (lefdefIO.cpp:241)
==16894== by 0x6C4604: replace_external::init_replace() (replace_external.cpp:325)
==16894== by 0x6CB065: _wrap_replace_external_init_replace (replace_wrap.cpp:3240)
==16894== by 0x6C8E1E: SWIG_Tcl_MethodCommand (replace_wrap.cpp:1329)
==16894== by 0x5C4AEB1: ??? (in /usr/lib64/libtcl8.5.so)
==16894== by 0x5C8F36B: ??? (in /usr/lib64/libtcl8.5.so)
==16894== by 0x5C97646: ??? (in /usr/lib64/libtcl8.5.so)
==16894== If you believe this happened as a result of a stack
==16894== overflow in your program's main thread (unlikely but
==16894== possible), you can try to increase the size of the
==16894== main thread stack using the --main-stacksize= flag.
==16894== The main thread stack size used in this run was 8388608.
==16894==
==16894== HEAP SUMMARY:
==16894== in use at exit: 999,096 bytes in 896 blocks
==16894== total heap usage: 1,084 allocs, 188 frees, 1,458,632 bytes allocated
==16894==
==16894== LEAK SUMMARY:
==16894== definitely lost: 0 bytes in 0 blocks
==16894== indirectly lost: 0 bytes in 0 blocks
==16894== possibly lost: 788,777 bytes in 45 blocks
==16894== still reachable: 210,319 bytes in 851 blocks
==16894== suppressed: 0 bytes in 0 blocks
==16894== Rerun with --leak-check=full to see details of leaked memory
==16894==
==16894== For counts of detected and suppressed errors, rerun with: -v
==16894== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

mgwoo commented

I got the exact same error when I use CentOS 8/gcc8.
Unfortunately, I still have no clue why this happens.

Could you use Docker environment, instead?

Yes, I do have gcc 8.3.0. Machine is CentOS 7.
What is your development platform? I am now compiling using gcc 4.8.5

mgwoo commented

If you can compile again with gcc-4.8.5/CentOS7, then the new binary would not have this problem.
I think this issue is due to the compiler's different memory handling on copy constructor/malloc...

We're going to integrate OpenDB with RePlAce, so this error would be removed in the near future.
(My modified LEF/DEF parsers seem to have a compiler-dependent problem...)

Here, compiled using 4.8.5:

==21032== Memcheck, a memory error detector
==21032== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==21032== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==21032== Command: replace
==21032== 
RePlAce Version: 1.0.0

** WARNING: Your Detail Placement Step must be skipped.
              (i.e. this program will be executed as -onlyGP)
     If you want to have DP after GP, please specify -dpflag and -dploc.

[INFO] TargetDensity = 1.000000
[INFO] ExperimentIndex = 34
[INFO] DirectoryPath = /home/[user]/Desktop/usb/scratch/etc/live/experiment034
[INFO] DefUnit = 100
[INFO] LefMetal1Name = METAL1
==21032== Invalid read of size 8
==21032==    at 0x8BF6E0: LefDefParser::defiRow::macro() const (defiRowTrack.cpp:364)
==21032==    by 0x5DABC1: SetParameter() (lefdefIO.cpp:352)
==21032==    by 0x5E036C: ParseLefDef() (lefdefIO.cpp:405)
==21032==    by 0x5E065D: ParseInput() (lefdefIO.cpp:241)
==21032==    by 0x673483: replace_external::init_replace() (replace_external.cpp:325)
==21032==    by 0x678B34: _wrap_replace_external_init_replace (replace_wrap.cpp:3240)
==21032==    by 0x67C8A4: SWIG_Tcl_MethodCommand (replace_wrap.cpp:1329)
==21032==    by 0x5C4AEB1: ??? (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C8F36B: ??? (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C97646: ??? (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C4C6B6: TclEvalObjEx (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C9D38B: Tcl_RecordAndEvalObj (in /usr/lib64/libtcl8.5.so)
==21032==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==21032== 
==21032== 
==21032== Process terminating with default action of signal 11 (SIGSEGV)
==21032==  Access not within mapped region at address 0x18
==21032==    at 0x8BF6E0: LefDefParser::defiRow::macro() const (defiRowTrack.cpp:364)
==21032==    by 0x5DABC1: SetParameter() (lefdefIO.cpp:352)
==21032==    by 0x5E036C: ParseLefDef() (lefdefIO.cpp:405)
==21032==    by 0x5E065D: ParseInput() (lefdefIO.cpp:241)
==21032==    by 0x673483: replace_external::init_replace() (replace_external.cpp:325)
==21032==    by 0x678B34: _wrap_replace_external_init_replace (replace_wrap.cpp:3240)
==21032==    by 0x67C8A4: SWIG_Tcl_MethodCommand (replace_wrap.cpp:1329)
==21032==    by 0x5C4AEB1: ??? (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C8F36B: ??? (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C97646: ??? (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C4C6B6: TclEvalObjEx (in /usr/lib64/libtcl8.5.so)
==21032==    by 0x5C9D38B: Tcl_RecordAndEvalObj (in /usr/lib64/libtcl8.5.so)
==21032==  If you believe this happened as a result of a stack
==21032==  overflow in your program's main thread (unlikely but
==21032==  possible), you can try to increase the size of the
==21032==  main thread stack using the --main-stacksize= flag.
==21032==  The main thread stack size used in this run was 8388608.
==21032== 
==21032== HEAP SUMMARY:
==21032==     in use at exit: 36,814,447 bytes in 1,179,547 blocks
==21032==   total heap usage: 2,760,034 allocs, 1,580,487 frees, 4,678,754,460 bytes allocated
==21032== 
==21032== LEAK SUMMARY:
==21032==    definitely lost: 11,829,995 bytes in 569,819 blocks
==21032==    indirectly lost: 2,904 bytes in 352 blocks
==21032==      possibly lost: 756,049 bytes in 44 blocks
==21032==    still reachable: 24,225,499 bytes in 609,332 blocks
==21032==                       of which reachable via heuristic:
==21032==                         stdstring          : 1,366,181 bytes in 49,366 blocks
==21032==         suppressed: 0 bytes in 0 blocks
==21032== Rerun with --leak-check=full to see details of leaked memory
==21032== 
==21032== For counts of detected and suppressed errors, rerun with: -v
==21032== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)


mgwoo commented

It means your DEF has a problem...
Could you check that your DEF contains the ROW definition?

No, it does not contain ROW definitions.

mgwoo commented

The ROW must be defined to run Global Placer ...

Ok, maybe I should just use the openflow as complete flow instead of QFlow.

I tried a make test on OpenROAD-Utilities/verilog-to-def and it resulted in a segfault. I re did make using gcc4.8.5 and it doesn't segfault.

Where can I ask questions regarding the tool usage?
I see capacitance and resistance units for RePlAce. What should I set as the units values? Isn't resistance and capacitance per micron, dependent on each metal layer? And has a different value for each layer...

Secondly, whats dploc flag? Google returned a result explaining the flags: RePlAce/doc/BinaryArguments.md, but apparently the file is no longer available.

mgwoo commented

I tried a make test on OpenROAD-Utilities/verilog-to-def and it resulted in a segfault. I re did make using gcc4.8.5 and it doesn't segfault.

OpenROAD-Utilities/verilog-to-def is obsolete. Could you use https://github.com/The-OpenROAD-Project/Resizer instead?

Could you please point me to a document which states the flow you follow?
I am trying to follow the demo script on yosys.

mgwoo commented

Where can I ask questions regarding the tool usage?
I see capacitance and resistance units for RePlAce. What should I set as the units values? Isn't resistance and capacitance per micron, dependent on each metal layer? And has a different value for each layer...

This is a known problem for us. For now, you may use M1 values. We're going to use a Machine-learning model or Global-Router tree model in the future.

Secondly, whats dploc flag? Google returned a result explaining the flags: RePlAce/doc/BinaryArguments.md, but apparently the file is no longer available.

-dpflag is no longer supported. you can use OpenDP for a detailed placer in our flow.
https://github.com/The-OpenROAD-Project/OpenDP

-dpflag is no longer supported.

I see reference to this flag on https://github.com/The-OpenROAD-Project/yosys

mgwoo commented

Could you please point me to a document which states the flow you follow?
I am trying to follow the demo script on yosys.

Could you check the alpha-release repositories?

  1. Dockerfile for building all of OpenROAD tools.
    https://github.com/The-OpenROAD-Project/alpha-release/tree/master/build/docker

  2. Makefile for running all of OpenROAD tools.
    https://github.com/The-OpenROAD-Project/alpha-release/blob/master/flow/Makefile

-dpflag is no longer supported.

I see reference to this flag on https://github.com/The-OpenROAD-Project/yosys

You can ignore that flag now. It is only used when you're going to use physical synthesis.
The Yosys should be updated not to have -dpflag.

ALHAMDOLILLAH i managed to get the flow running. I didn't use docker, but instead ran setup.sh in released binaries, and then ran the flow from the source i downloaded from github, /alpha-release/flow

I did make in the flow directory, however, it wouldn't go to the Final/ Finish stage. I edited the Makefile statement: all: route to all: finish (I wonder, why it wasn't already...)

The final gds is attached here. Please have a look. I don't remember altering any 'area' parameter etc in the configuration files for the gcd design, but the gds seems odd. It gave me 8% utilization. Also, the power buses seem missing and the cell seem to overlap in two consecutive rows.

Am I doing something wrong...?

mgwoo commented

Sorry for my late reply.
You can change the DIEAREA in the following Makefile settings:

https://github.com/The-OpenROAD-Project/alpha-release/blob/7c0421f6a4cb7f9c5df2fee126b2d40ed72d3d95/flow/designs/gcd_nangate45.mk#L14-L17

I will close this issue. (Your issue is out of my scope)